A heart attack disrupts the normal flow of blood to the heart muscle,potentially causing severe damage or death if not treated promptly.It can lead to long-term health complications,reduce quality of life,and signific...A heart attack disrupts the normal flow of blood to the heart muscle,potentially causing severe damage or death if not treated promptly.It can lead to long-term health complications,reduce quality of life,and significantly impact daily activities and overall well-being.Despite the growing popularity of deep learning,several drawbacks persist,such as complexity and the limitation of single-model learning.In this paper,we introduce a residual learning-based feature fusion technique to achieve high accuracy in differentiating abnormal cardiac rhythms heart sound.Combining MobileNet with DenseNet201 for feature fusion leverages MobileNet lightweight,efficient architecture with DenseNet201,dense connections,resulting in enhanced feature extraction and improved model performance with reduced computational cost.To further enhance the fusion,we employed residual learning to optimize the hierarchical features of heart abnormal sounds during training.The experimental results demonstrate that the proposed fusion method achieved an accuracy of 95.67%on the benchmark PhysioNet-2016 Spectrogram dataset.To further validate the performance,we applied it to the BreakHis dataset with a magnification level of 100X.The results indicate that the model maintains robust performance on the second dataset,achieving an accuracy of 96.55%.it highlights its consistent performance,making it a suitable for various applications.展开更多
In-process damage to a cutting tool degrades the surfacenish of the job shaped by machining and causes a signicantnancial loss.This stimulates the need for Tool Condition Monitoring(TCM)t...In-process damage to a cutting tool degrades the surfacenish of the job shaped by machining and causes a signicantnancial loss.This stimulates the need for Tool Condition Monitoring(TCM)to assist detection of failure before it extends to the worse phase.Machine Learning(ML)based TCM has been extensively explored in the last decade.However,most of the research is now directed toward Deep Learning(DL).The“Deep”formulation,hierarchical compositionality,distributed representation and end-to-end learning of Neural Nets need to be explored to create a generalized TCM framework to perform eciently in a high-noise environment of cross-domain machining.With this motivation,the design of dierent CNN(Convolutional Neural Network)architectures such as AlexNet,ResNet-50,LeNet-5,and VGG-16 is presented in this paper.Real-time spindle vibrations corresponding to healthy and various faulty congurations of milling cutter were acquired.This data was transformed into the time-frequency domain and further processed by proposed architectures in graphical form,i.e.,spectrogram.The model is trained,tested,and validated considering dierent datasets and showcased promising results.展开更多
A new lighting and enlargement on phase spectrogram (PS) and frequency spectrogram (FS) is presented in this paper. These representations result from the coupling of power spectrogram and short time Fourier transf...A new lighting and enlargement on phase spectrogram (PS) and frequency spectrogram (FS) is presented in this paper. These representations result from the coupling of power spectrogram and short time Fourier transform (STFT). The main contribution is the construction of the 3D phase spectrogram (3DPS) and the 3D frequency spectrogram (3DFS). These new tools allow such specific test signals as small slope linear chirp, phase jump case of musical signal analysis is reported. The main objective is to and small frequency jump to be analyzed. An application detect small frequency and phase variations in order to characterize each type of sound attack without losing the amplitude information given by power spectrogram展开更多
Cardiovascular diseases(CVDs)remain one of the foremost causes of death globally;hence,the need for several must-have,advanced automated diagnostic solutions towards early detection and intervention.Traditional auscul...Cardiovascular diseases(CVDs)remain one of the foremost causes of death globally;hence,the need for several must-have,advanced automated diagnostic solutions towards early detection and intervention.Traditional auscultation of cardiovascular sounds is heavily reliant on clinical expertise and subject to high variability.To counter this limitation,this study proposes an AI-driven classification system for cardiovascular sounds whereby deep learning techniques are engaged to automate the detection of an abnormal heartbeat.We employ FastAI vision-learner-based convolutional neural networks(CNNs)that include ResNet,DenseNet,VGG,ConvNeXt,SqueezeNet,and AlexNet to classify heart sound recordings.Instead of raw waveform analysis,the proposed approach transforms preprocessed cardiovascular audio signals into spectrograms,which are suited for capturing temporal and frequency-wise patterns.The models are trained on the PASCAL Cardiovascular Challenge dataset while taking into consideration the recording variations,noise levels,and acoustic distortions.To demonstrate generalization,external validation using Google’s Audio set Heartbeat Sound dataset was performed using a dataset rich in cardiovascular sounds.Comparative analysis revealed that DenseNet-201,ConvNext Large,and ResNet-152 could deliver superior performance to the other architectures,achieving an accuracy of 81.50%,a precision of 85.50%,and an F1-score of 84.50%.In the process,we performed statistical significance testing,such as the Wilcoxon signed-rank test,to validate performance improvements over traditional classification methods.Beyond the technical contributions,the research underscores clinical integration,outlining a pathway in which the proposed system can augment conventional electronic stethoscopes and telemedicine platforms in the AI-assisted diagnostic workflows.We also discuss in detail issues of computational efficiency,model interpretability,and ethical considerations,particularly concerning algorithmic bias stemming from imbalanced datasets and the need for real-time processing in clinical settings.The study describes a scalable,automated system combining deep learning,feature extraction using spectrograms,and external validation that can assist healthcare providers in the early and accurate detection of cardiovascular disease.AI-driven solutions can be viable in improving access,reducing delays in diagnosis,and ultimately even the continued global burden of heart disease.展开更多
This study examines the variations in noise levels across various subway lines in Singapore and three other cities,and provides a detailed overview of the trends and factors influencing subway noise.Most of the equiva...This study examines the variations in noise levels across various subway lines in Singapore and three other cities,and provides a detailed overview of the trends and factors influencing subway noise.Most of the equivalent sound pressure level(Leq)in typical subway cabins across the Singapore subway lines are below 85 dBA,with some notable exceptions.These variations in noise levels are influenced by several factors,including rolling stock structure,track conditions and environmental and aerodynamic factors.The spectrogram analysis indicates that the cabin noise is mostly concentrated below the frequency of 1,000 Hz.This study also analyzes cabin noise in subway systems in Suzhou,Seoul,and Tokyo to allow for broader comparisons.It studies the impact of factors such as stock materials,track conditions including the quality of the rails,the presence of curves or irregularities,and maintenance frequency on cabin noise.展开更多
Frequency-modulated continuous-wave radar enables the non-contact and privacy-preserving recognition of human behavior.However,the accuracy of behavior recognition is directly influenced by the spatial relationship be...Frequency-modulated continuous-wave radar enables the non-contact and privacy-preserving recognition of human behavior.However,the accuracy of behavior recognition is directly influenced by the spatial relationship between human posture and the radar.To address the issue of low accuracy in behavior recognition when the human body is not directly facing the radar,a method combining local outlier factor with Doppler information is proposed for the correction of multi-classifier recognition results.Initially,the information such as distance,velocity,and micro-Doppler spectrogram of the target is obtained using the fast Fourier transform and histogram of oriented gradients-support vector machine methods,followed by preliminary recognition.Subsequently,Platt scaling is employed to transform recognition results into confidence scores,and finally,the Doppler-local outlier factor method is utilized to calibrate the confidence scores,with the highest confidence classifier result considered as the recognition outcome.Experimental results demonstrate that this approach achieves an average recognition accuracy of 96.23%for comprehensive human behavior recognition in various orientations.展开更多
To solve the problem of mismatching features in an experimental database, which is a key technique in the field of cross-corpus speech emotion recognition, an auditory attention model based on Chirplet is proposed for...To solve the problem of mismatching features in an experimental database, which is a key technique in the field of cross-corpus speech emotion recognition, an auditory attention model based on Chirplet is proposed for feature extraction.First, in order to extract the spectra features, the auditory attention model is employed for variational emotion features detection. Then, the selective attention mechanism model is proposed to extract the salient gist features which showtheir relation to the expected performance in cross-corpus testing.Furthermore, the Chirplet time-frequency atoms are introduced to the model. By forming a complete atom database, the Chirplet can improve the spectrum feature extraction including the amount of information. Samples from multiple databases have the characteristics of multiple components. Hereby, the Chirplet expands the scale of the feature vector in the timefrequency domain. Experimental results show that, compared to the traditional feature model, the proposed feature extraction approach with the prototypical classifier has significant improvement in cross-corpus speech recognition. In addition, the proposed method has better robustness to the inconsistent sources of the training set and the testing set.展开更多
This paper addresses the problem of single-channel speech enhancement in the adverse environment. The critical-band rate scale based on improved multi-band spectral subtraction is investigated in this study for enhanc...This paper addresses the problem of single-channel speech enhancement in the adverse environment. The critical-band rate scale based on improved multi-band spectral subtraction is investigated in this study for enhancement of single-channel speech. In this work, the whole speech spectrum is divided into different non-uniformly spaced frequency bands in accordance with the critical-band rate scale of the psycho-acoustic model and the spectral over-subtraction is carried-out separately in each band. In addition, for the estimation of the noise from each band, the adaptive noise estimation approach is used and does not require explicit speech silence detection. The noise is estimated and updated by adaptively smoothing the noisy signal power in each band. The smoothing parameter is controlled by a-posteriori signal-to-noise ratio (SNR). For the performance analysis of the proposed algorithm, the objective measures, such as, SNR, segmental SNR, and perceptual evaluations of the speech quality are conducted for the variety of noises at different levels of SNRs. The speech spectrogram and objective evaluations of the proposed algorithm are compared with other standard speech enhancement algorithms and proved that the musical structure of the remnant noise and background noise is better suppressed by the proposed algorithm.展开更多
Cat vocal behavior, in particular, the vocal and social behavior of feral cats, is poorly understood, as are the differences between feral and fully domestic cats. The relationship between feral cat social and vocal b...Cat vocal behavior, in particular, the vocal and social behavior of feral cats, is poorly understood, as are the differences between feral and fully domestic cats. The relationship between feral cat social and vocal behavior is important because of the markedly different ecology of feral and domestic cats, and enhanced comprehension of the repertoire and potential information content of feral cat calls can provide both better understanding of the domestication and socialization process, and improved welfare for feral cats undergoing adoption. Previous studies have used conflicting classi- fication schemes for cat vocalizations, often relying on onomatopoeic or popular descriptions of call types (e.g., "miow'). We studied the vocalizations of 13 unaltered domestic cats that complied with our behavioral definition used to distinguish feral cats from domestic. A total of 71 acoustic units were extracted and visually analyzed for the construction of a hierarchical classification of vocal sounds, based on acoustic properties. We identified 3 major categories (tonal, pulse, and broadband) that further breakdown into 8 subcategories, and show a high degree of reliability when sounds are classified blindly by independent observers (Fleiss' Kappa K= 0.863). Due to the limited behavioral contexts in this study, additional subcategories of cat vocalizations may be identified in the future, but our hierarchical classification system allows for the addition of new categories and new subcategories as they are described. This study shows that cat vocalizations are diverse and complex, and provides an objective and reliable classification system that can be used in future studies.展开更多
A potential concept that could be effective for multiple applications is a“cyber-physical system”(CPS).The Internet of Things(IoT)has evolved as a research area,presenting new challenges in obtaining valuable data t...A potential concept that could be effective for multiple applications is a“cyber-physical system”(CPS).The Internet of Things(IoT)has evolved as a research area,presenting new challenges in obtaining valuable data through environmental monitoring.The existing work solely focuses on classifying the audio system of CPS without utilizing feature extraction.This study employs a deep learning method,CNN-LSTM,and two-way feature extraction to classify audio systems within CPS.The primary objective of this system,which is built upon a convolutional neural network(CNN)with Long Short Term Memory(LSTM),is to analyze the vocalization patterns of two different species of anurans.It has been demonstrated that CNNs,when combined with mel-spectrograms for sound analysis,are suitable for classifying ambient noises.Initially,the data is augmented and preprocessed.Next,the mel spectrogram features are extracted through two-way feature extraction.First,Principal Component Analysis(PCA)is utilized for dimensionality reduction,followed by Transfer learning for audio feature extraction.Finally,the classification is performed using the CNN-LSTM process.This methodology can potentially be employed for categorizing various biological acoustic objects and analyzing biodiversity indexes in natural environments,resulting in high classification accuracy.The study highlights that this CNNLSTM approach enables cost-effective and resource-efficient monitoring of large natural regions.The dissemination of updated CNN-LSTM models across distant IoT nodes is facilitated flexibly and dynamically through the utilization of CPS.展开更多
Problem statement: The results of the study of seism acoustic emission arising in a porous two-phase geological environment under acoustic influence are presented. Acoustic emission arising in reservoirs of oil fields...Problem statement: The results of the study of seism acoustic emission arising in a porous two-phase geological environment under acoustic influence are presented. Acoustic emission arising in reservoirs of oil fields using good observations is considered. The regularity of the emission processes of acoustic emission, which manifests itself in the form of discrete spectra of signals similar to oscillations of nonlinearly coupled oscillators, is shown. Spectra have special characteristics for each type of rock. Applied method and design: An algorithm for modeling the process of resonant acoustic response of a porous fluid-saturated reservoir with hierarchical structure and plastic properties on acoustic frequency excitation is developed. That algorithm is developed as an iterative process for the solution integral and integral-differential equations. The frequencies that are parameters of the direct problem are used from the spectra of observed data of acoustic emission in the oil wells. Typical results: For the first time, it had been found the relation between resonant frequencies of the acoustic emission and plastic properties, these values of frequencies had been used in the algorithm of modeling distribution of longitudinal waves in the fluid saturated nonlinear plastic environment. Concluding note (Practical value/implications): The analysis of these emission processes can serve as a source of information about the filtration-capacitive properties of productive reservoirs of a porous type with a hierarchical structure. It is used by practical data of oil fields of Western Siberia.展开更多
The study is based on an observation of the pronunciation of a group of undergraduate students of English as a Second Language (ESL) whose mother tongue is Arabic and who have no formal training in the spoken variet...The study is based on an observation of the pronunciation of a group of undergraduate students of English as a Second Language (ESL) whose mother tongue is Arabic and who have no formal training in the spoken variety of English other than that received in the classroom. The study of acquisition of pronunciation of consonant clusters at morphological, particularly at the morphophonological levels indicates that the learners are sensitive to the syllabic structure viz., cccv type and cccvcc type, at the word-initial, medial and final positions. Samples of words with different consonant clusters were tested with a homogeneous group of students. Words of identical morphological categories were used as the data to test the students' level of perception. These were analyzed using Speech Analyzer Version 2.5. The data includes consonant clusters like plosive-fricative, plosive-plosive, fricative-fricative and plosive-fricative-trill/liquid combinations. The results varied according to the perceptual and articulatory abilities of the learners. It was observed that the plosive perception and acquisition of three-consonant clusters of plosive-plosive word initially, plosive-plosive combinations word finally and plosive-fricative type, posed more difficulty for the learners. The tendency to drop one of the consonants of the cluster was more pronounced with syllables ending in plural morphemes and those ending in -mp, -pt, -kt, -nt, -bt, etc. Difficulty was also noticed with the initial plosive+/r/, plosive+/1/combinations, especially in word initial positions. Across the syllable boundaries, these clusters are almost inaudible with some speakers. The difficulty in the articulation of these consonant clusters can be accounted for the mother tongue influence, as in the case of many other features. The results of the analysis have a pedagogical implication in the use of such words with consonant clusters, to teach reading skills to the students of undergraduate level in the present setting and promote self-learning through the use of speech tools.展开更多
Acoustic array sensor device for partial discharge detection is widely used in power equipment inspection with the advantages of non-contact and precise positioning compared with partial discharge detection methods su...Acoustic array sensor device for partial discharge detection is widely used in power equipment inspection with the advantages of non-contact and precise positioning compared with partial discharge detection methods such as ultrasonic method and pulse current method.However,due to the sensitivity of the acoustic array sensor and the influence of the equipment operation site interference,the acoustic array sensor device for partial discharge type diagnosis by phase resolved partial discharge(PRPD)map might occasionally presents incorrect results,thus affecting the power equipment operation and maintenance strategy.The acoustic array sensor detection device for power equipment developed in this paper applies the array design model of equal-area multi-arm spiral with machine learning fast fourier transform clean(FFT-CLEAN)sound source localization identification algorithm to avoid the interference factors in the noise acquisition system using a single microphone and conventional beam forming algorithm,improves the spatial resolution of the acoustic array sensor device,and proposes an acoustic array sensor device based on the acoustic spectrogram.The analysis and diagnosis method of discharge type of acoustic array sensor device can effectively reduce the system misjudgment caused by factors such as the resolution of the acoustic imaging device and the time domain pulse of the digital signal,and reduce the false alarm rate of the acoustic array sensor device.The proposed method is tested by selecting power cables as the object,and its effectiveness is proved by laboratory verification and field verification.展开更多
Spectrogram representations of acoustic scenes have achieved competitive performance for acoustic scene classification. Yet, the spectrogram alone does not take into account a substantial amount of time-frequency info...Spectrogram representations of acoustic scenes have achieved competitive performance for acoustic scene classification. Yet, the spectrogram alone does not take into account a substantial amount of time-frequency information. In this study, we present an approach for exploring the benefits of deep scalogram representations, extracted in segments from an audio stream. The approach presented firstly transforms the segmented acoustic scenes into bump and morse scalograms, as well as spectrograms; secondly, the spectrograms or scalograms are sent into pre-trained convolutional neural networks; thirdly,the features extracted from a subsequent fully connected layer are fed into(bidirectional) gated recurrent neural networks, which are followed by a single highway layer and a softmax layer;finally, predictions from these three systems are fused by a margin sampling value strategy. We then evaluate the proposed approach using the acoustic scene classification data set of 2017 IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events(DCASE). On the evaluation set, an accuracy of 64.0 % from bidirectional gated recurrent neural networks is obtained when fusing the spectrogram and the bump scalogram, which is an improvement on the 61.0 % baseline result provided by the DCASE 2017 organisers. This result shows that extracted bump scalograms are capable of improving the classification accuracy,when fusing with a spectrogram-based system.展开更多
On March 26,2010 an underwater explosion(UWE)led to the sinking of the ROKS Cheonan.The official Multinational Civilian-Military Joint Investigation Group(MCMJIG)report concluded that the cause of the underwater explo...On March 26,2010 an underwater explosion(UWE)led to the sinking of the ROKS Cheonan.The official Multinational Civilian-Military Joint Investigation Group(MCMJIG)report concluded that the cause of the underwater explosion was a 250 kg net explosive weight(NEW)detonation at a depth of 6 9 m from a DPRK"CHT-02D"torpedo.Kim and Gitterman(2012a)determined the NEW and seismic magnitude as 136 kg at a depth of approximately 8m and 2.04,respectively using basic hydrodynamics based on theoretical and experimental methods as well as spectral analysis and seismic methods.The purpose of this study was to clarify the cause of the UWE via more detailed methods using bubble dynamics and simulation of propellers as well as forensic seismology.Regarding the observed bubble pulse period of 0.990 s,0.976 s and 1.030 s were found in case of a 136NEW at a detonation depth of 8 m using the boundary element method(BEM)and 3D bubble shape simulations derived for a 136kg NEW detonation at a depth of 8 m approximately 5 m portside from the hull centerline.Here we show through analytical equations,models and 3D bubble shape simulations that the most probable cause of this underwater explosion was a 136 kg NEW detonation at a depth of 8m attributable to a ROK littoral"land control"mine(LCM).展开更多
The cavitation in axial piston pumps threatens the reliability and safety of the overall hydraulic system.Vibration signal can reflect the cavitation conditions in axial piston pumps and it has been combined with mach...The cavitation in axial piston pumps threatens the reliability and safety of the overall hydraulic system.Vibration signal can reflect the cavitation conditions in axial piston pumps and it has been combined with machine learning to detect the pump cavitation.However,the vibration signal usually contains noise in real working conditions,which raises concerns about accurate recognition of cavitation in noisy environment.This paper presents an intelligent method to recognise the cavitation in axial piston pumps in noisy environment.First,we train a convolutional neural network(CNN)using the spectrogram images transformed from raw vibration data under different cavitation conditions.Second,we employ the technique of gradient-weighted class activation mapping(Grad-CAM)to visualise class-discriminative regions in the spectrogram image.Finally,we propose a novel image processing method based on Grad-CAM heatmap to automatically remove entrained noise and enhance class features in the spectrogram image.The experimental results show that the proposed method greatly improves the diagnostic performance of the CNN model in noisy environments.The classification accuracy of cavitation conditions increases from 0.50 to 0.89 and from 0.80 to 0.92 at signal-to-noise ratios of 4 and 6 dB,respectively.展开更多
Recently,user recognitionmethods to authenticate personal identity has attracted significant attention especially with increased availability of various internet of things(IoT)services through fifth-generation technol...Recently,user recognitionmethods to authenticate personal identity has attracted significant attention especially with increased availability of various internet of things(IoT)services through fifth-generation technology(5G)based mobile devices.The EMG signals generated inside the body with unique individual characteristics are being studied as a part of nextgeneration user recognition methods.However,there is a limitation when applying EMG signals to user recognition systems as the same operation needs to be repeated while maintaining a constant strength of muscle over time.Hence,it is necessary to conduct research on multidimensional feature transformation that includes changes in frequency features over time.In this paper,we propose a user recognition system that applies EMG signals to the short-time fourier transform(STFT),and converts the signals into EMG spectrogram images while adjusting the time-frequency resolution to extract multidimensional features.The proposed system is composed of a data pre-processing and normalization process,spectrogram image conversion process,and final classification process.The experimental results revealed that the proposed EMG spectrogram image-based user recognition system has a 95.4%accuracy performance,which is 13%higher than the EMGsignal-based system.Such a user recognition accuracy improvement was achieved by using multidimensional features,in the time-frequency domain.展开更多
Biometric authentication is a rapidly growing trend that is gaining increasing attention in the last decades.It achieves safe access to systems using biometrics instead of the traditional passwords.The utilization of ...Biometric authentication is a rapidly growing trend that is gaining increasing attention in the last decades.It achieves safe access to systems using biometrics instead of the traditional passwords.The utilization of a biometric in its original format makes it usable only once.Therefore,a cancelable biometric template should be used,so that it can be replaced when it is attacked.Cancelable biometrics aims to enhance the security and privacy of biometric authentication.Digital encryption is an efficient technique to be used in order to generate cancelable biometric templates.In this paper,a highly-secure encryption algorithm is proposed to ensure secure biometric data in verification systems.The considered biometric in this paper is the speech signal.The speech signal is transformed into its spectrogram.Then,the spectrogram is encrypted using two cascaded optical encryption algorithms.The first algorithm is the Optical Scanning Holography(OSH)for its efficiency as an encryption tool.The OSH encrypted spectrogram is encrypted using Double Random Phase Encoding(DRPE)by implementing two Random Phase Masks(RPMs).After the two cascaded optical encryption algorithms,the cancelable template is obtained.The verification is implemented through correlation estimation between enrolled and test templates in their encrypted format.If the correlation value is larger than a threshold value,the user is authorized.The threshold value can be determined from the genuine and imposter correlation distribution curves as the midpoint between the two curves.The implementation of optical encryption is adopted using its software rather than the optical setup.The efficiency of the proposed cancelable biometric algorithm is illustrated by the simulation results.It can improve the biometric data security without deteriorating the recognition accuracy.Simulation results give close-to-zero This values for the Equal Error Rate(EER)and close-to-one values for the Area under Receiver Operator Characteristic(AROC)curve.展开更多
Automatic speaker recognition(ASR)systems are the field of Human-machine interaction and scientists have been using feature extraction and feature matching methods to analyze and synthesize these signals.One of the mo...Automatic speaker recognition(ASR)systems are the field of Human-machine interaction and scientists have been using feature extraction and feature matching methods to analyze and synthesize these signals.One of the most commonly used methods for feature extraction is Mel Frequency Cepstral Coefficients(MFCCs).Recent researches show that MFCCs are successful in processing the voice signal with high accuracies.MFCCs represents a sequence of voice signal-specific features.This experimental analysis is proposed to distinguish Turkish speakers by extracting the MFCCs from the speech recordings.Since the human perception of sound is not linear,after the filterbank step in theMFCC method,we converted the obtained log filterbanks into decibel(dB)features-based spectrograms without applying the Discrete Cosine Transform(DCT).A new dataset was created with converted spectrogram into a 2-D array.Several learning algorithms were implementedwith a 10-fold cross-validationmethod to detect the speaker.The highest accuracy of 90.2%was achieved using Multi-layer Perceptron(MLP)with tanh activation function.The most important output of this study is the inclusion of human voice as a new feature set.展开更多
文摘A heart attack disrupts the normal flow of blood to the heart muscle,potentially causing severe damage or death if not treated promptly.It can lead to long-term health complications,reduce quality of life,and significantly impact daily activities and overall well-being.Despite the growing popularity of deep learning,several drawbacks persist,such as complexity and the limitation of single-model learning.In this paper,we introduce a residual learning-based feature fusion technique to achieve high accuracy in differentiating abnormal cardiac rhythms heart sound.Combining MobileNet with DenseNet201 for feature fusion leverages MobileNet lightweight,efficient architecture with DenseNet201,dense connections,resulting in enhanced feature extraction and improved model performance with reduced computational cost.To further enhance the fusion,we employed residual learning to optimize the hierarchical features of heart abnormal sounds during training.The experimental results demonstrate that the proposed fusion method achieved an accuracy of 95.67%on the benchmark PhysioNet-2016 Spectrogram dataset.To further validate the performance,we applied it to the BreakHis dataset with a magnification level of 100X.The results indicate that the model maintains robust performance on the second dataset,achieving an accuracy of 96.55%.it highlights its consistent performance,making it a suitable for various applications.
文摘In-process damage to a cutting tool degrades the surfacenish of the job shaped by machining and causes a signicantnancial loss.This stimulates the need for Tool Condition Monitoring(TCM)to assist detection of failure before it extends to the worse phase.Machine Learning(ML)based TCM has been extensively explored in the last decade.However,most of the research is now directed toward Deep Learning(DL).The“Deep”formulation,hierarchical compositionality,distributed representation and end-to-end learning of Neural Nets need to be explored to create a generalized TCM framework to perform eciently in a high-noise environment of cross-domain machining.With this motivation,the design of dierent CNN(Convolutional Neural Network)architectures such as AlexNet,ResNet-50,LeNet-5,and VGG-16 is presented in this paper.Real-time spindle vibrations corresponding to healthy and various faulty congurations of milling cutter were acquired.This data was transformed into the time-frequency domain and further processed by proposed architectures in graphical form,i.e.,spectrogram.The model is trained,tested,and validated considering dierent datasets and showcased promising results.
文摘A new lighting and enlargement on phase spectrogram (PS) and frequency spectrogram (FS) is presented in this paper. These representations result from the coupling of power spectrogram and short time Fourier transform (STFT). The main contribution is the construction of the 3D phase spectrogram (3DPS) and the 3D frequency spectrogram (3DFS). These new tools allow such specific test signals as small slope linear chirp, phase jump case of musical signal analysis is reported. The main objective is to and small frequency jump to be analyzed. An application detect small frequency and phase variations in order to characterize each type of sound attack without losing the amplitude information given by power spectrogram
基金funded by the deanship of scientific research(DSR),King Abdulaziz University,Jeddah,under grant No.(G-1436-611-309).
文摘Cardiovascular diseases(CVDs)remain one of the foremost causes of death globally;hence,the need for several must-have,advanced automated diagnostic solutions towards early detection and intervention.Traditional auscultation of cardiovascular sounds is heavily reliant on clinical expertise and subject to high variability.To counter this limitation,this study proposes an AI-driven classification system for cardiovascular sounds whereby deep learning techniques are engaged to automate the detection of an abnormal heartbeat.We employ FastAI vision-learner-based convolutional neural networks(CNNs)that include ResNet,DenseNet,VGG,ConvNeXt,SqueezeNet,and AlexNet to classify heart sound recordings.Instead of raw waveform analysis,the proposed approach transforms preprocessed cardiovascular audio signals into spectrograms,which are suited for capturing temporal and frequency-wise patterns.The models are trained on the PASCAL Cardiovascular Challenge dataset while taking into consideration the recording variations,noise levels,and acoustic distortions.To demonstrate generalization,external validation using Google’s Audio set Heartbeat Sound dataset was performed using a dataset rich in cardiovascular sounds.Comparative analysis revealed that DenseNet-201,ConvNext Large,and ResNet-152 could deliver superior performance to the other architectures,achieving an accuracy of 81.50%,a precision of 85.50%,and an F1-score of 84.50%.In the process,we performed statistical significance testing,such as the Wilcoxon signed-rank test,to validate performance improvements over traditional classification methods.Beyond the technical contributions,the research underscores clinical integration,outlining a pathway in which the proposed system can augment conventional electronic stethoscopes and telemedicine platforms in the AI-assisted diagnostic workflows.We also discuss in detail issues of computational efficiency,model interpretability,and ethical considerations,particularly concerning algorithmic bias stemming from imbalanced datasets and the need for real-time processing in clinical settings.The study describes a scalable,automated system combining deep learning,feature extraction using spectrograms,and external validation that can assist healthcare providers in the early and accurate detection of cardiovascular disease.AI-driven solutions can be viable in improving access,reducing delays in diagnosis,and ultimately even the continued global burden of heart disease.
文摘This study examines the variations in noise levels across various subway lines in Singapore and three other cities,and provides a detailed overview of the trends and factors influencing subway noise.Most of the equivalent sound pressure level(Leq)in typical subway cabins across the Singapore subway lines are below 85 dBA,with some notable exceptions.These variations in noise levels are influenced by several factors,including rolling stock structure,track conditions and environmental and aerodynamic factors.The spectrogram analysis indicates that the cabin noise is mostly concentrated below the frequency of 1,000 Hz.This study also analyzes cabin noise in subway systems in Suzhou,Seoul,and Tokyo to allow for broader comparisons.It studies the impact of factors such as stock materials,track conditions including the quality of the rails,the presence of curves or irregularities,and maintenance frequency on cabin noise.
基金the National Key Research and Development Program of China(No.2022YFC3601400)。
文摘Frequency-modulated continuous-wave radar enables the non-contact and privacy-preserving recognition of human behavior.However,the accuracy of behavior recognition is directly influenced by the spatial relationship between human posture and the radar.To address the issue of low accuracy in behavior recognition when the human body is not directly facing the radar,a method combining local outlier factor with Doppler information is proposed for the correction of multi-classifier recognition results.Initially,the information such as distance,velocity,and micro-Doppler spectrogram of the target is obtained using the fast Fourier transform and histogram of oriented gradients-support vector machine methods,followed by preliminary recognition.Subsequently,Platt scaling is employed to transform recognition results into confidence scores,and finally,the Doppler-local outlier factor method is utilized to calibrate the confidence scores,with the highest confidence classifier result considered as the recognition outcome.Experimental results demonstrate that this approach achieves an average recognition accuracy of 96.23%for comprehensive human behavior recognition in various orientations.
基金The National Natural Science Foundation of China(No.61273266,61231002,61301219,61375028)the Specialized Research Fund for the Doctoral Program of Higher Education(No.20110092130004)the Natural Science Foundation of Shandong Province(No.ZR2014FQ016)
文摘To solve the problem of mismatching features in an experimental database, which is a key technique in the field of cross-corpus speech emotion recognition, an auditory attention model based on Chirplet is proposed for feature extraction.First, in order to extract the spectra features, the auditory attention model is employed for variational emotion features detection. Then, the selective attention mechanism model is proposed to extract the salient gist features which showtheir relation to the expected performance in cross-corpus testing.Furthermore, the Chirplet time-frequency atoms are introduced to the model. By forming a complete atom database, the Chirplet can improve the spectrum feature extraction including the amount of information. Samples from multiple databases have the characteristics of multiple components. Hereby, the Chirplet expands the scale of the feature vector in the timefrequency domain. Experimental results show that, compared to the traditional feature model, the proposed feature extraction approach with the prototypical classifier has significant improvement in cross-corpus speech recognition. In addition, the proposed method has better robustness to the inconsistent sources of the training set and the testing set.
文摘This paper addresses the problem of single-channel speech enhancement in the adverse environment. The critical-band rate scale based on improved multi-band spectral subtraction is investigated in this study for enhancement of single-channel speech. In this work, the whole speech spectrum is divided into different non-uniformly spaced frequency bands in accordance with the critical-band rate scale of the psycho-acoustic model and the spectral over-subtraction is carried-out separately in each band. In addition, for the estimation of the noise from each band, the adaptive noise estimation approach is used and does not require explicit speech silence detection. The noise is estimated and updated by adaptively smoothing the noisy signal power in each band. The smoothing parameter is controlled by a-posteriori signal-to-noise ratio (SNR). For the performance analysis of the proposed algorithm, the objective measures, such as, SNR, segmental SNR, and perceptual evaluations of the speech quality are conducted for the variety of noises at different levels of SNRs. The speech spectrogram and objective evaluations of the proposed algorithm are compared with other standard speech enhancement algorithms and proved that the musical structure of the remnant noise and background noise is better suppressed by the proposed algorithm.
文摘Cat vocal behavior, in particular, the vocal and social behavior of feral cats, is poorly understood, as are the differences between feral and fully domestic cats. The relationship between feral cat social and vocal behavior is important because of the markedly different ecology of feral and domestic cats, and enhanced comprehension of the repertoire and potential information content of feral cat calls can provide both better understanding of the domestication and socialization process, and improved welfare for feral cats undergoing adoption. Previous studies have used conflicting classi- fication schemes for cat vocalizations, often relying on onomatopoeic or popular descriptions of call types (e.g., "miow'). We studied the vocalizations of 13 unaltered domestic cats that complied with our behavioral definition used to distinguish feral cats from domestic. A total of 71 acoustic units were extracted and visually analyzed for the construction of a hierarchical classification of vocal sounds, based on acoustic properties. We identified 3 major categories (tonal, pulse, and broadband) that further breakdown into 8 subcategories, and show a high degree of reliability when sounds are classified blindly by independent observers (Fleiss' Kappa K= 0.863). Due to the limited behavioral contexts in this study, additional subcategories of cat vocalizations may be identified in the future, but our hierarchical classification system allows for the addition of new categories and new subcategories as they are described. This study shows that cat vocalizations are diverse and complex, and provides an objective and reliable classification system that can be used in future studies.
基金Funded by Institutional Fund Projects under Grant No.IFPIP:236-611-1442 by Ministry of Education and King Abdulaziz University,Jeddah,Saudi Arabia(A.O.A.).
文摘A potential concept that could be effective for multiple applications is a“cyber-physical system”(CPS).The Internet of Things(IoT)has evolved as a research area,presenting new challenges in obtaining valuable data through environmental monitoring.The existing work solely focuses on classifying the audio system of CPS without utilizing feature extraction.This study employs a deep learning method,CNN-LSTM,and two-way feature extraction to classify audio systems within CPS.The primary objective of this system,which is built upon a convolutional neural network(CNN)with Long Short Term Memory(LSTM),is to analyze the vocalization patterns of two different species of anurans.It has been demonstrated that CNNs,when combined with mel-spectrograms for sound analysis,are suitable for classifying ambient noises.Initially,the data is augmented and preprocessed.Next,the mel spectrogram features are extracted through two-way feature extraction.First,Principal Component Analysis(PCA)is utilized for dimensionality reduction,followed by Transfer learning for audio feature extraction.Finally,the classification is performed using the CNN-LSTM process.This methodology can potentially be employed for categorizing various biological acoustic objects and analyzing biodiversity indexes in natural environments,resulting in high classification accuracy.The study highlights that this CNNLSTM approach enables cost-effective and resource-efficient monitoring of large natural regions.The dissemination of updated CNN-LSTM models across distant IoT nodes is facilitated flexibly and dynamically through the utilization of CPS.
文摘Problem statement: The results of the study of seism acoustic emission arising in a porous two-phase geological environment under acoustic influence are presented. Acoustic emission arising in reservoirs of oil fields using good observations is considered. The regularity of the emission processes of acoustic emission, which manifests itself in the form of discrete spectra of signals similar to oscillations of nonlinearly coupled oscillators, is shown. Spectra have special characteristics for each type of rock. Applied method and design: An algorithm for modeling the process of resonant acoustic response of a porous fluid-saturated reservoir with hierarchical structure and plastic properties on acoustic frequency excitation is developed. That algorithm is developed as an iterative process for the solution integral and integral-differential equations. The frequencies that are parameters of the direct problem are used from the spectra of observed data of acoustic emission in the oil wells. Typical results: For the first time, it had been found the relation between resonant frequencies of the acoustic emission and plastic properties, these values of frequencies had been used in the algorithm of modeling distribution of longitudinal waves in the fluid saturated nonlinear plastic environment. Concluding note (Practical value/implications): The analysis of these emission processes can serve as a source of information about the filtration-capacitive properties of productive reservoirs of a porous type with a hierarchical structure. It is used by practical data of oil fields of Western Siberia.
文摘The study is based on an observation of the pronunciation of a group of undergraduate students of English as a Second Language (ESL) whose mother tongue is Arabic and who have no formal training in the spoken variety of English other than that received in the classroom. The study of acquisition of pronunciation of consonant clusters at morphological, particularly at the morphophonological levels indicates that the learners are sensitive to the syllabic structure viz., cccv type and cccvcc type, at the word-initial, medial and final positions. Samples of words with different consonant clusters were tested with a homogeneous group of students. Words of identical morphological categories were used as the data to test the students' level of perception. These were analyzed using Speech Analyzer Version 2.5. The data includes consonant clusters like plosive-fricative, plosive-plosive, fricative-fricative and plosive-fricative-trill/liquid combinations. The results varied according to the perceptual and articulatory abilities of the learners. It was observed that the plosive perception and acquisition of three-consonant clusters of plosive-plosive word initially, plosive-plosive combinations word finally and plosive-fricative type, posed more difficulty for the learners. The tendency to drop one of the consonants of the cluster was more pronounced with syllables ending in plural morphemes and those ending in -mp, -pt, -kt, -nt, -bt, etc. Difficulty was also noticed with the initial plosive+/r/, plosive+/1/combinations, especially in word initial positions. Across the syllable boundaries, these clusters are almost inaudible with some speakers. The difficulty in the articulation of these consonant clusters can be accounted for the mother tongue influence, as in the case of many other features. The results of the analysis have a pedagogical implication in the use of such words with consonant clusters, to teach reading skills to the students of undergraduate level in the present setting and promote self-learning through the use of speech tools.
基金This work was supported by the science and technology project of State Grid Shanghai Municipal Electric Power Company(No.52090020007F)National Key R&D Program of China(2017YFB0902800).
文摘Acoustic array sensor device for partial discharge detection is widely used in power equipment inspection with the advantages of non-contact and precise positioning compared with partial discharge detection methods such as ultrasonic method and pulse current method.However,due to the sensitivity of the acoustic array sensor and the influence of the equipment operation site interference,the acoustic array sensor device for partial discharge type diagnosis by phase resolved partial discharge(PRPD)map might occasionally presents incorrect results,thus affecting the power equipment operation and maintenance strategy.The acoustic array sensor detection device for power equipment developed in this paper applies the array design model of equal-area multi-arm spiral with machine learning fast fourier transform clean(FFT-CLEAN)sound source localization identification algorithm to avoid the interference factors in the noise acquisition system using a single microphone and conventional beam forming algorithm,improves the spatial resolution of the acoustic array sensor device,and proposes an acoustic array sensor device based on the acoustic spectrogram.The analysis and diagnosis method of discharge type of acoustic array sensor device can effectively reduce the system misjudgment caused by factors such as the resolution of the acoustic imaging device and the time domain pulse of the digital signal,and reduce the false alarm rate of the acoustic array sensor device.The proposed method is tested by selecting power cables as the object,and its effectiveness is proved by laboratory verification and field verification.
基金supported by the German National BMBF IKT2020-Grant(16SV7213)(EmotAsS)the European-Unions Horizon 2020 Research and Innovation Programme(688835)(DE-ENIGMA)the China Scholarship Council(CSC)
文摘Spectrogram representations of acoustic scenes have achieved competitive performance for acoustic scene classification. Yet, the spectrogram alone does not take into account a substantial amount of time-frequency information. In this study, we present an approach for exploring the benefits of deep scalogram representations, extracted in segments from an audio stream. The approach presented firstly transforms the segmented acoustic scenes into bump and morse scalograms, as well as spectrograms; secondly, the spectrograms or scalograms are sent into pre-trained convolutional neural networks; thirdly,the features extracted from a subsequent fully connected layer are fed into(bidirectional) gated recurrent neural networks, which are followed by a single highway layer and a softmax layer;finally, predictions from these three systems are fused by a margin sampling value strategy. We then evaluate the proposed approach using the acoustic scene classification data set of 2017 IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events(DCASE). On the evaluation set, an accuracy of 64.0 % from bidirectional gated recurrent neural networks is obtained when fusing the spectrogram and the bump scalogram, which is an improvement on the 61.0 % baseline result provided by the DCASE 2017 organisers. This result shows that extracted bump scalograms are capable of improving the classification accuracy,when fusing with a spectrogram-based system.
文摘On March 26,2010 an underwater explosion(UWE)led to the sinking of the ROKS Cheonan.The official Multinational Civilian-Military Joint Investigation Group(MCMJIG)report concluded that the cause of the underwater explosion was a 250 kg net explosive weight(NEW)detonation at a depth of 6 9 m from a DPRK"CHT-02D"torpedo.Kim and Gitterman(2012a)determined the NEW and seismic magnitude as 136 kg at a depth of approximately 8m and 2.04,respectively using basic hydrodynamics based on theoretical and experimental methods as well as spectral analysis and seismic methods.The purpose of this study was to clarify the cause of the UWE via more detailed methods using bubble dynamics and simulation of propellers as well as forensic seismology.Regarding the observed bubble pulse period of 0.990 s,0.976 s and 1.030 s were found in case of a 136NEW at a detonation depth of 8 m using the boundary element method(BEM)and 3D bubble shape simulations derived for a 136kg NEW detonation at a depth of 8 m approximately 5 m portside from the hull centerline.Here we show through analytical equations,models and 3D bubble shape simulations that the most probable cause of this underwater explosion was a 136 kg NEW detonation at a depth of 8m attributable to a ROK littoral"land control"mine(LCM).
基金National Key R&D Program of China,Grant/Award Number:2018YFB1702503Open Foundation of the State Key Laboratory of Fluid Power and Mechatronic Systems,Grant/Award Number:GZKF-202108+2 种基金Open Foundation of the Guangdong Provincial Key Laboratory of Electronic Information Products Reliability TechnologyChina National Postdoctoral Program for Innovative Talents,Grant/Award Number:BX20200210China Postdoctoral Science Foundation,Grant/Award Number:2019M660086。
文摘The cavitation in axial piston pumps threatens the reliability and safety of the overall hydraulic system.Vibration signal can reflect the cavitation conditions in axial piston pumps and it has been combined with machine learning to detect the pump cavitation.However,the vibration signal usually contains noise in real working conditions,which raises concerns about accurate recognition of cavitation in noisy environment.This paper presents an intelligent method to recognise the cavitation in axial piston pumps in noisy environment.First,we train a convolutional neural network(CNN)using the spectrogram images transformed from raw vibration data under different cavitation conditions.Second,we employ the technique of gradient-weighted class activation mapping(Grad-CAM)to visualise class-discriminative regions in the spectrogram image.Finally,we propose a novel image processing method based on Grad-CAM heatmap to automatically remove entrained noise and enhance class features in the spectrogram image.The experimental results show that the proposed method greatly improves the diagnostic performance of the CNN model in noisy environments.The classification accuracy of cavitation conditions increases from 0.50 to 0.89 and from 0.80 to 0.92 at signal-to-noise ratios of 4 and 6 dB,respectively.
基金supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(No.2017R1A6A1A03015496)the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.NRF-2021R1A2C1014033).
文摘Recently,user recognitionmethods to authenticate personal identity has attracted significant attention especially with increased availability of various internet of things(IoT)services through fifth-generation technology(5G)based mobile devices.The EMG signals generated inside the body with unique individual characteristics are being studied as a part of nextgeneration user recognition methods.However,there is a limitation when applying EMG signals to user recognition systems as the same operation needs to be repeated while maintaining a constant strength of muscle over time.Hence,it is necessary to conduct research on multidimensional feature transformation that includes changes in frequency features over time.In this paper,we propose a user recognition system that applies EMG signals to the short-time fourier transform(STFT),and converts the signals into EMG spectrogram images while adjusting the time-frequency resolution to extract multidimensional features.The proposed system is composed of a data pre-processing and normalization process,spectrogram image conversion process,and final classification process.The experimental results revealed that the proposed EMG spectrogram image-based user recognition system has a 95.4%accuracy performance,which is 13%higher than the EMGsignal-based system.Such a user recognition accuracy improvement was achieved by using multidimensional features,in the time-frequency domain.
基金funded and supported by the Taif University Researchers Supporting Project Number(TURSP-2020/147),Taif University,Taif,Saudi Arabia.
文摘Biometric authentication is a rapidly growing trend that is gaining increasing attention in the last decades.It achieves safe access to systems using biometrics instead of the traditional passwords.The utilization of a biometric in its original format makes it usable only once.Therefore,a cancelable biometric template should be used,so that it can be replaced when it is attacked.Cancelable biometrics aims to enhance the security and privacy of biometric authentication.Digital encryption is an efficient technique to be used in order to generate cancelable biometric templates.In this paper,a highly-secure encryption algorithm is proposed to ensure secure biometric data in verification systems.The considered biometric in this paper is the speech signal.The speech signal is transformed into its spectrogram.Then,the spectrogram is encrypted using two cascaded optical encryption algorithms.The first algorithm is the Optical Scanning Holography(OSH)for its efficiency as an encryption tool.The OSH encrypted spectrogram is encrypted using Double Random Phase Encoding(DRPE)by implementing two Random Phase Masks(RPMs).After the two cascaded optical encryption algorithms,the cancelable template is obtained.The verification is implemented through correlation estimation between enrolled and test templates in their encrypted format.If the correlation value is larger than a threshold value,the user is authorized.The threshold value can be determined from the genuine and imposter correlation distribution curves as the midpoint between the two curves.The implementation of optical encryption is adopted using its software rather than the optical setup.The efficiency of the proposed cancelable biometric algorithm is illustrated by the simulation results.It can improve the biometric data security without deteriorating the recognition accuracy.Simulation results give close-to-zero This values for the Equal Error Rate(EER)and close-to-one values for the Area under Receiver Operator Characteristic(AROC)curve.
基金This work was supported by the GRRC program of Gyeonggi province.[GRRC-Gachon2020(B04),Development of AI-based Healthcare Devices].
文摘Automatic speaker recognition(ASR)systems are the field of Human-machine interaction and scientists have been using feature extraction and feature matching methods to analyze and synthesize these signals.One of the most commonly used methods for feature extraction is Mel Frequency Cepstral Coefficients(MFCCs).Recent researches show that MFCCs are successful in processing the voice signal with high accuracies.MFCCs represents a sequence of voice signal-specific features.This experimental analysis is proposed to distinguish Turkish speakers by extracting the MFCCs from the speech recordings.Since the human perception of sound is not linear,after the filterbank step in theMFCC method,we converted the obtained log filterbanks into decibel(dB)features-based spectrograms without applying the Discrete Cosine Transform(DCT).A new dataset was created with converted spectrogram into a 2-D array.Several learning algorithms were implementedwith a 10-fold cross-validationmethod to detect the speaker.The highest accuracy of 90.2%was achieved using Multi-layer Perceptron(MLP)with tanh activation function.The most important output of this study is the inclusion of human voice as a new feature set.