Because pixel values of foggy images are irregularly higher than those of images captured in normal weather(clear images),it is difficult to extract and express their texture.No method has previously been developed to...Because pixel values of foggy images are irregularly higher than those of images captured in normal weather(clear images),it is difficult to extract and express their texture.No method has previously been developed to directly explore the relationship between foggy images and semantic segmentation images.We investigated this relationship and propose a generative adversarial network(GAN)for foggy image semantic segmentation(FISS GAN),which contains two parts:an edge GAN and a semantic segmentation GAN.The edge GAN is designed to generate edge information from foggy images to provide auxiliary information to the semantic segmentation GAN.The semantic segmentation GAN is designed to extract and express the texture of foggy images and generate semantic segmentation images.Experiments on foggy cityscapes datasets and foggy driving datasets indicated that FISS GAN achieved state-of-the-art performance.展开更多
Image semantic segmentation is an essential technique for studying human behavior through image data.This paper proposes an image semantic segmentation method for human behavior research.Firstly,an end-to-end convolut...Image semantic segmentation is an essential technique for studying human behavior through image data.This paper proposes an image semantic segmentation method for human behavior research.Firstly,an end-to-end convolutional neural network architecture is proposed,which consists of a depth-separable jump-connected fully convolutional network and a conditional random field network;then jump-connected convolution is used to classify each pixel in the image,and an image semantic segmentation method based on convolu-tional neural network is proposed;and then a conditional random field network is used to improve the effect of image segmentation of hu-man behavior and a linear modeling and nonlinear modeling method based on the semantic segmentation of conditional random field im-age is proposed.Finally,using the proposed image segmentation network,the input entrepreneurial image data is semantically segmented to obtain the contour features of the person;and the segmentation of the images in the medical field.The experimental results show that the image semantic segmentation method is effective.It is a new way to use image data to study human behavior and can be extended to other research areas.展开更多
Automatic image annotation has been an active topic of research in computer vision and pattern recognition for decades.A two stage automatic image annotation method based on Gaussian mixture model(GMM) and random walk...Automatic image annotation has been an active topic of research in computer vision and pattern recognition for decades.A two stage automatic image annotation method based on Gaussian mixture model(GMM) and random walk model(abbreviated as GMM-RW) is presented.To start with,GMM fitted by the rival penalized expectation maximization(RPEM) algorithm is employed to estimate the posterior probabilities of each annotation keyword.Subsequently,a random walk process over the constructed label similarity graph is implemented to further mine the potential correlations of the candidate annotations so as to capture the refining results,which plays a crucial role in semantic based image retrieval.The contributions exhibited in this work are multifold.First,GMM is exploited to capture the initial semantic annotations,especially the RPEM algorithm is utilized to train the model that can determine the number of components in GMM automatically.Second,a label similarity graph is constructed by a weighted linear combination of label similarity and visual similarity of images associated with the corresponding labels,which is able to avoid the phenomena of polysemy and synonym efficiently during the image annotation process.Third,the random walk is implemented over the constructed label graph to further refine the candidate set of annotations generated by GMM.Conducted experiments on the standard Corel5 k demonstrate that GMM-RW is significantly more effective than several state-of-the-arts regarding their effectiveness and efficiency in the task of automatic image annotation.展开更多
Coal-rock interface identification technology was pivotal in automatically adjusting the shearer's cutting drum during coal mining.However,it also served as a technical bottleneck hindering the advancement of inte...Coal-rock interface identification technology was pivotal in automatically adjusting the shearer's cutting drum during coal mining.However,it also served as a technical bottleneck hindering the advancement of intelligent coal mining.This study aimed to address the poor accuracy of current coal-rock identification technology on comprehensive working faces,coupled with the limited availability of coal-rock datasets.The loss function of the SegFormer model was enhanced,the model's hyperparameters and learning rate were adjusted,and an automatic recognition method was proposed for coal-rock interfaces based on FL-SegFormer.Additionally,an experimental platform was constructed to simulate the dusty environment during coal-rock cutting by the shearer,enabling the collection of coal-rock test image datasets.The morphology-based algorithms were employed to expand the coal-rock image datasets through image rotation,color dithering,and Gaussian noise injection so as to augment the diversity and applicability of the datasets.As a result,a coal-rock image dataset comprising 8424 samples was generated.The findings demonstrated that the FL-SegFormer model achieved a Mean Intersection over Union(MIoU)and mean pixel accuracy(MPA)of 97.72%and 98.83%,respectively.The FLSegFormer model outperformed other models in terms of recognition accuracy,as evidenced by an MloU exceeding 95.70% of the original image.Furthermore,the FL-SegFormer model using original coal-rock images was validated from No.15205 working face of the Yulin test mine in northern Shaanxi.The calculated average error was only 1.77%,and the model operated at a rate of 46.96 frames per second,meeting the practical application and deployment requirements in underground settings.These results provided a theoretical foundation for achieving automatic and efficient mining with coal mining machines and the intelligent development of coal mines.展开更多
As simulation-informed design gains importance in addressing urban complexity,integrating urban imagery into interactive feedback and decision-making has become increasingly essential.However,this potential remains un...As simulation-informed design gains importance in addressing urban complexity,integrating urban imagery into interactive feedback and decision-making has become increasingly essential.However,this potential remains underused,as urban imagery is often treated as a supporting variable in urban research rather than a core layer of spatial intelligence,hindering informed strategies in city branding,resource allocation,and livability.This study develops a data-driven framework,Street View Search Engine,which integrates urban imagery analysis with interactive exploration to advance human-centered insights into urban visual form.Based on 81,478 street view imagery collected in Hong Kong,China,a dataset comprising 19 visual features was first constructed to represent urban visual information across three categories:physical,impression,and isovist.Subsequently,the machine learning algorithm self-organizing maps was employed to train the dataset,producing a visualized“data landscape”that re-organizes street views according to their visual similarities.Third,building on the data landscape,this study develops the Street View Search Engine framework to conduct three main tasks:define visual foundations,comprehend streetscape morphology,and evaluate regional visual schemes.These tasks combine general-use exploration with research-oriented analysis:a web-based platform was developed to support general-use exploration(http://47.113.226.77/project1/#/),while various data processing methods were employed to enable in-depth professional investigations.By transforming raw data into a visualizable,computable,and interactive urban imagery system,this study paves the way for evidence-based interventions,strategic resource allocation,and greater public engagement in urban planning.展开更多
Remote sensing images contain a wealth of geospatial information.To accurately identify different geospatial categories and extract relevant data,image semantic segmentation plays a crucial role.In recent years,deep l...Remote sensing images contain a wealth of geospatial information.To accurately identify different geospatial categories and extract relevant data,image semantic segmentation plays a crucial role.In recent years,deep learning technology has brought significant breakthroughs to semantic segmentation of remote sensing images,significantly enhancing its performance.This paper investigates the application of deep learning technologies in remote sensing image semantic segmentation,based on Convolutional Neural Networks(CNN)and Transformer-based semantic segmentation methods.It conducts an in-depth comparison of their structural characteristics and applicable scenarios,summarizes the achievements and shortcomings of existing research,and provides technical references and theoretical support for future studies,thereby contributing to the further development of deep learning technology in the field of remote sensing.Research results indicate that CNN-based semantic segmentation methods still hold advantages in extracting local features and achieving efficient segmentation,whereas Transformers address CNN's limitations in global context modeling and long-range dependency capture.Therefore,the collaborative integration of CNN and Transformers will become an important research direction for enhancing model performance in the future.展开更多
Weed management plays a crucial role in increasing crop yields.Semantic segmentation,which classifies each pixel in an image captured by a camera into categories such as crops,weeds,and background,is a widely used met...Weed management plays a crucial role in increasing crop yields.Semantic segmentation,which classifies each pixel in an image captured by a camera into categories such as crops,weeds,and background,is a widely used method in this context.However,conventional semantic segmentation methods rely solely on pixel information within the camera's field of view(FOV),hindering their ability to detect weeds outside the visible area.This limitation can lead to incomplete weed removal and inefficient herbicide application.Incorporating information beyond the FOV in crop and weed segmentation is therefore essential for effective herbicide usage.Nevertheless,existing research on crop and weed segmentation has largely overlooked this limitation.To address this issue,we propose the knowledge distillation-based outpainting and semantic segmentation network(KDOSS-Net)for crop and weed images,a novel framework that enhances segmentation accuracy by leveraging information beyond the FOV.KDOSS-Net consists of two parts:the object prediction-guided outpainting and semantic segmentation network(OPOSS-Net),which serves as the teacher model by restoring areas outside the FOV and performing semantic segmentation,and the semantic segmentation without outpainting network(SSWO-Net),which serves as the student model,directly performing segmentation without outpainting.Through knowledge distillation(KD),the student model learns from the teacher's outputs,which results in a lightweight yet highly accurate segmentation network that is suitable for deployment on agricultural robots with limited computing power.Experiments on three public datasets-Rice seedling and weed,CWFID,and BoniRob-yielded mean intersection over union(mIOU)scores of 0.6315,0.7101,and 0.7524,respectively.These results demonstrate that KDOSS-Net achieves higher accuracy than existing state-of-the-art(SOTA)segmentation models while significantly reducing computational overhead.Furthermore,the weed information extracted using our method is automatically linked as input to the open-source large language and vision assistant(LLaVA),enabling the development of a system that recommends optimal herbicide strategies tailored to the detected weed class.展开更多
Semantic image synthesis aims to generate highquality images given semantic conditions,i.e.,segmentation masks and style reference images.Existing methods widely adopt generative adversarial networks(GANs).GANs take a...Semantic image synthesis aims to generate highquality images given semantic conditions,i.e.,segmentation masks and style reference images.Existing methods widely adopt generative adversarial networks(GANs).GANs take all conditional inputs and directly synthesize images in a single forward step.In this paper,semantic image synthesis is treated as an image denoising task and is handled with a novel image-to-image diffusion model(IIDM).展开更多
Aiming to solve the poor performance of low illumination enhancement algorithms on uneven illumination images,a low-light image enhancement(LIME)algorithm based on a residual network was proposed.The algorithm constru...Aiming to solve the poor performance of low illumination enhancement algorithms on uneven illumination images,a low-light image enhancement(LIME)algorithm based on a residual network was proposed.The algorithm constructs a deep network that uses residual modules to extract image feature information and semantic modules to extract image semantic information from different levels.Moreover,a composite loss function was also designed for the process of low illumination image enhancement,which dynamically evaluated the loss of an enhanced image from three factors of color,structure,and gradient.It ensures that the model can correctly enhance the image features according to the image semantics,so that the enhancement results are more in line with the human visual experience.Experimental results show that compared with the state-of-the-art algorithms,the semantic-driven residual low-light network(SRLLN)can effectively improve the quality of low illumination images,and achieve better subjective and objective evaluation indexes on different types of images.展开更多
The increased number of free and open Sentinel satellite images has led to new applications of these data.Among them is the systematic classification of land cover/use types based on patterns of settlements or agricul...The increased number of free and open Sentinel satellite images has led to new applications of these data.Among them is the systematic classification of land cover/use types based on patterns of settlements or agriculture recorded by these images,in particular,the identification and quantification of their temporal changes.In this paper,we will present guidelines and practical examples of how to obtain rapid and reliable image patch labelling results and their validation based on data mining techniques for detecting these temporal changes,and presenting these as classification maps and/or statistical analytics.This represents a new systematic validation approach for semantic image content verification.We will focus on a number of different scenarios proposed by the user community using Sentinel data.From a large number of potential use cases,we selected three main cases,namely forest monitoring,flood monitoring,and macro-economics/urban monitoring.展开更多
Semantic image parsing, which refers to the pro- cess of decomposing images into semantic regions and constructing the structure representation of the input, has re- cently aroused widespread interest in the field of ...Semantic image parsing, which refers to the pro- cess of decomposing images into semantic regions and constructing the structure representation of the input, has re- cently aroused widespread interest in the field of computer vision. The recent application of deep representation learning has driven this field into a new stage of development. In this paper, we summarize three aspects of the progress of research on semantic image parsing, i.e., category-level semantic segmentation, instance-level semantic segmentation, and beyond segmentation. Specifically, we first review the general frameworks for each task and introduce the relevant variants. The advantages and limitations of each method are also discussed. Moreover, we present a comprehensive comparison of different benchmark datasets and evaluation metrics. Finally, we explore the future trends and challenges of semantic image parsing.展开更多
The explosive increase in the number of images on the Internet has brought with it the great challenge of how to effectively index, retrieve, and organize these resources. Assigning proper tags to the visual content i...The explosive increase in the number of images on the Internet has brought with it the great challenge of how to effectively index, retrieve, and organize these resources. Assigning proper tags to the visual content is key to the success of many applications such as image retrieval and content mining. Although recent years have witnessed many advances in image tagging, these methods have limitations when applied to high-quality and large-scale training data that are expensive to obtain. In this paper, we propose a novel semantic neighbor learning method based on user-contributed social image datasets that can be acquired from the Web's inexhaustible social image content. In contrast to existing image tagging approaches that rely on high-quality image-tag supervision, we acquire weak supervision of our neighbor learning method by progressive neighborhood retrieval from noisy and diverse user-contributed image collections. The retrieved neighbor images are not only visually alike and partially correlated but also semantically related. We offer a step-by-step and easy-to-use implementation for the proposed method. Extensive experimentation on several datasets demonstrates that the performance of the proposed method significantly outperforms others.展开更多
Ephemeral gullies are widely distributed in the hilly and gully region of the Loess Plateau and play a unique role in the slope gully erosion system.Rapid and accurate identification of ephemeral gullies impacts the d...Ephemeral gullies are widely distributed in the hilly and gully region of the Loess Plateau and play a unique role in the slope gully erosion system.Rapid and accurate identification of ephemeral gullies impacts the distribution law and development trend of soil erosion on the Loess Plateau.Deep learning algorithms can quickly and accurately process large data samples that recognize ephemeral gullies from remote sensing images.Here,we investigated ephemeral gullies in the Zhoutungou watershed in the hilly and gully region of the Loess Plateau in China using satellite and unmanned aerial vehicle images and combined a deep learning image semantic segmentation model to realize automatic recognition and feature extraction.Using Accuracy,Precision,Recall,F1value,and AUC,we compared the ephemeral gully recognition results and accuracy evaluation of U-Net,R2U-Net,and SegNet image semantic segmentation models.The SegNet model was ranked first,followed by the R2U-Net and U-Net models,for ephemeral gully recognition in the hilly and gully region of the Loess Plateau.The ephemeral gully length and width between predicted and measured values had RMSE values of 6.78 m and 0.50 m,respectively,indicating that the model has an excellent recognition effect.This study identified a fast and accurate method for ephemeral gully recognition in the hilly and gully region of the Loess Plateau based on remote sensing images to provide an academic reference and practical guidance for soil erosion monitoring and slope and gully management in the Loess Plateau region.展开更多
Aiming at the convergence between Earth observation(EO)Big Data and Artificial General Intelligence(AGI),this two-part paper identifies an innovative,but realistic EO optical sensory imagederived semantics-enriched An...Aiming at the convergence between Earth observation(EO)Big Data and Artificial General Intelligence(AGI),this two-part paper identifies an innovative,but realistic EO optical sensory imagederived semantics-enriched Analysis Ready Data(ARD)productpair and process gold standard as linchpin for success of a new notion of Space Economy 4.0.To be implemented in operational mode at the space segment and/or midstream segment by both public and private EO big data providers,it is regarded as necessarybut-not-sufficient“horizontal”(enabling)precondition for:(I)Transforming existing EO big raster-based data cubes at the midstream segment,typically affected by the so-called data-rich information-poor syndrome,into a new generation of semanticsenabled EO big raster-based numerical data and vector-based categorical(symbolic,semi-symbolic or subsymbolic)information cube management systems,eligible for semantic content-based image retrieval and semantics-enabled information/knowledge discovery.(II)Boosting the downstream segment in the development of an ever-increasing ensemble of“vertical”(deep and narrow,user-specific and domain-dependent)value–adding information products and services,suitable for a potentially huge worldwide market of institutional and private end-users of space technology.For the sake of readability,this paper consists of two parts.In the present Part 1,first,background notions in the remote sensing metascience domain are critically revised for harmonization across the multidisciplinary domain of cognitive science.In short,keyword“information”is disambiguated into the two complementary notions of quantitative/unequivocal information-as-thing and qualitative/equivocal/inherently ill-posed information-as-data-interpretation.Moreover,buzzword“artificial intelligence”is disambiguated into the two better-constrained notions of Artificial Narrow Intelligence as part-without-inheritance-of AGI.Second,based on a betterdefined and better-understood vocabulary of multidisciplinary terms,existing EO optical sensory image-derived Level 2/ARD products and processes are investigated at the Marr five levels of understanding of an information processing system.To overcome their drawbacks,an innovative,but realistic EO optical sensory image-derived semantics-enriched ARD product-pair and process gold standard is proposed in the subsequent Part 2.展开更多
Aiming at the convergence between Earth observation(EO)Big Data and Artificial General Intelligence(AGI),this paper consists of two parts.In the previous Part 1,existing EO optical sensory imagederived Level 2/Analysi...Aiming at the convergence between Earth observation(EO)Big Data and Artificial General Intelligence(AGI),this paper consists of two parts.In the previous Part 1,existing EO optical sensory imagederived Level 2/Analysis Ready Data(ARD)products and processes are critically compared,to overcome their lack of harmonization/standardization/interoperability and suitability in a new notion of Space Economy 4.0.In the present Part 2,original contributions comprise,at the Marr five levels of system understanding:(1)an innovative,but realistic EO optical sensory image-derived semantics-enriched ARD co-product pair requirements specification.First,in the pursuit of third-level semantic/ontological interoperability,a novel ARD symbolic(categorical and semantic)co-product,known as Scene Classification Map(SCM),adopts an augmented Cloud versus Not-Cloud taxonomy,whose Not-Cloud class legend complies with the standard fully-nested Land Cover Classification System’s Dichotomous Phase taxonomy proposed by the United Nations Food and Agriculture Organization.Second,a novel ARD subsymbolic numerical co-product,specifically,a panchromatic or multispectral EO image whose dimensionless digital numbers are radiometrically calibrated into a physical unit of radiometric measure,ranging from top-of-atmosphere reflectance to surface reflectance and surface albedo values,in a five-stage radiometric correction sequence.(2)An original ARD process requirements specification.(3)An innovative ARD processing system design(architecture),where stepwise SCM generation and stepwise SCM-conditional EO optical image radiometric correction are alternated in sequence.(4)An original modular hierarchical hybrid(combined deductive and inductive)computer vision subsystem design,provided with feedback loops,where software solutions at the Marr two shallowest levels of system understanding,specifically,algorithm and implementation,are selected from the scientific literature,to benefit from their technology readiness level as proof of feasibility,required in addition to proven suitability.To be implemented in operational mode at the space segment and/or midstream segment by both public and private EO big data providers,the proposed EO optical sensory image-derived semantics-enriched ARD product-pair and process reference standard is highlighted as linchpin for success of a new notion of Space Economy 4.0.展开更多
基金supported in part by the National Key Research and Development Program of China(2018YFB1305002)the National Natural Science Foundation of China(62006256)+2 种基金the Postdoctoral Science Foundation of China(2020M683050)the Key Research and Development Program of Guangzhou(202007050002)the Fundamental Research Funds for the Central Universities(67000-31610134)。
文摘Because pixel values of foggy images are irregularly higher than those of images captured in normal weather(clear images),it is difficult to extract and express their texture.No method has previously been developed to directly explore the relationship between foggy images and semantic segmentation images.We investigated this relationship and propose a generative adversarial network(GAN)for foggy image semantic segmentation(FISS GAN),which contains two parts:an edge GAN and a semantic segmentation GAN.The edge GAN is designed to generate edge information from foggy images to provide auxiliary information to the semantic segmentation GAN.The semantic segmentation GAN is designed to extract and express the texture of foggy images and generate semantic segmentation images.Experiments on foggy cityscapes datasets and foggy driving datasets indicated that FISS GAN achieved state-of-the-art performance.
基金Supported by the Major Consulting and Research Project of the Chinese Academy of Engineering(2020-CQ-ZD-1)the National Natural Science Foundation of China(72101235)Zhejiang Soft Science Research Program(2023C35012)。
文摘Image semantic segmentation is an essential technique for studying human behavior through image data.This paper proposes an image semantic segmentation method for human behavior research.Firstly,an end-to-end convolutional neural network architecture is proposed,which consists of a depth-separable jump-connected fully convolutional network and a conditional random field network;then jump-connected convolution is used to classify each pixel in the image,and an image semantic segmentation method based on convolu-tional neural network is proposed;and then a conditional random field network is used to improve the effect of image segmentation of hu-man behavior and a linear modeling and nonlinear modeling method based on the semantic segmentation of conditional random field im-age is proposed.Finally,using the proposed image segmentation network,the input entrepreneurial image data is semantically segmented to obtain the contour features of the person;and the segmentation of the images in the medical field.The experimental results show that the image semantic segmentation method is effective.It is a new way to use image data to study human behavior and can be extended to other research areas.
基金Supported by the National Basic Research Program of China(No.2013CB329502)the National Natural Science Foundation of China(No.61202212)+1 种基金the Special Research Project of the Educational Department of Shaanxi Province of China(No.15JK1038)the Key Research Project of Baoji University of Arts and Sciences(No.ZK16047)
文摘Automatic image annotation has been an active topic of research in computer vision and pattern recognition for decades.A two stage automatic image annotation method based on Gaussian mixture model(GMM) and random walk model(abbreviated as GMM-RW) is presented.To start with,GMM fitted by the rival penalized expectation maximization(RPEM) algorithm is employed to estimate the posterior probabilities of each annotation keyword.Subsequently,a random walk process over the constructed label similarity graph is implemented to further mine the potential correlations of the candidate annotations so as to capture the refining results,which plays a crucial role in semantic based image retrieval.The contributions exhibited in this work are multifold.First,GMM is exploited to capture the initial semantic annotations,especially the RPEM algorithm is utilized to train the model that can determine the number of components in GMM automatically.Second,a label similarity graph is constructed by a weighted linear combination of label similarity and visual similarity of images associated with the corresponding labels,which is able to avoid the phenomena of polysemy and synonym efficiently during the image annotation process.Third,the random walk is implemented over the constructed label graph to further refine the candidate set of annotations generated by GMM.Conducted experiments on the standard Corel5 k demonstrate that GMM-RW is significantly more effective than several state-of-the-arts regarding their effectiveness and efficiency in the task of automatic image annotation.
基金funded by the National Natural Science Foundation of China(52004201,52274143,52204153)China Postdoctoral Science Foundation(2021T140551).
文摘Coal-rock interface identification technology was pivotal in automatically adjusting the shearer's cutting drum during coal mining.However,it also served as a technical bottleneck hindering the advancement of intelligent coal mining.This study aimed to address the poor accuracy of current coal-rock identification technology on comprehensive working faces,coupled with the limited availability of coal-rock datasets.The loss function of the SegFormer model was enhanced,the model's hyperparameters and learning rate were adjusted,and an automatic recognition method was proposed for coal-rock interfaces based on FL-SegFormer.Additionally,an experimental platform was constructed to simulate the dusty environment during coal-rock cutting by the shearer,enabling the collection of coal-rock test image datasets.The morphology-based algorithms were employed to expand the coal-rock image datasets through image rotation,color dithering,and Gaussian noise injection so as to augment the diversity and applicability of the datasets.As a result,a coal-rock image dataset comprising 8424 samples was generated.The findings demonstrated that the FL-SegFormer model achieved a Mean Intersection over Union(MIoU)and mean pixel accuracy(MPA)of 97.72%and 98.83%,respectively.The FLSegFormer model outperformed other models in terms of recognition accuracy,as evidenced by an MloU exceeding 95.70% of the original image.Furthermore,the FL-SegFormer model using original coal-rock images was validated from No.15205 working face of the Yulin test mine in northern Shaanxi.The calculated average error was only 1.77%,and the model operated at a rate of 46.96 frames per second,meeting the practical application and deployment requirements in underground settings.These results provided a theoretical foundation for achieving automatic and efficient mining with coal mining machines and the intelligent development of coal mines.
基金supported by the National Natural Science Foundation of China(Grant No.52308015)the Guangdong Basic and Applied Basic Research Foundation(Grant No.2024A1515011214)the Guangzhou Science and Technology Planning Project(Grant No.SL2024A04J01189).
文摘As simulation-informed design gains importance in addressing urban complexity,integrating urban imagery into interactive feedback and decision-making has become increasingly essential.However,this potential remains underused,as urban imagery is often treated as a supporting variable in urban research rather than a core layer of spatial intelligence,hindering informed strategies in city branding,resource allocation,and livability.This study develops a data-driven framework,Street View Search Engine,which integrates urban imagery analysis with interactive exploration to advance human-centered insights into urban visual form.Based on 81,478 street view imagery collected in Hong Kong,China,a dataset comprising 19 visual features was first constructed to represent urban visual information across three categories:physical,impression,and isovist.Subsequently,the machine learning algorithm self-organizing maps was employed to train the dataset,producing a visualized“data landscape”that re-organizes street views according to their visual similarities.Third,building on the data landscape,this study develops the Street View Search Engine framework to conduct three main tasks:define visual foundations,comprehend streetscape morphology,and evaluate regional visual schemes.These tasks combine general-use exploration with research-oriented analysis:a web-based platform was developed to support general-use exploration(http://47.113.226.77/project1/#/),while various data processing methods were employed to enable in-depth professional investigations.By transforming raw data into a visualizable,computable,and interactive urban imagery system,this study paves the way for evidence-based interventions,strategic resource allocation,and greater public engagement in urban planning.
文摘Remote sensing images contain a wealth of geospatial information.To accurately identify different geospatial categories and extract relevant data,image semantic segmentation plays a crucial role.In recent years,deep learning technology has brought significant breakthroughs to semantic segmentation of remote sensing images,significantly enhancing its performance.This paper investigates the application of deep learning technologies in remote sensing image semantic segmentation,based on Convolutional Neural Networks(CNN)and Transformer-based semantic segmentation methods.It conducts an in-depth comparison of their structural characteristics and applicable scenarios,summarizes the achievements and shortcomings of existing research,and provides technical references and theoretical support for future studies,thereby contributing to the further development of deep learning technology in the field of remote sensing.Research results indicate that CNN-based semantic segmentation methods still hold advantages in extracting local features and achieving efficient segmentation,whereas Transformers address CNN's limitations in global context modeling and long-range dependency capture.Therefore,the collaborative integration of CNN and Transformers will become an important research direction for enhancing model performance in the future.
基金This work was supported in part by the Ministry of Science and ICT(MSIT),Korea,through the Information Technology Research Center(ITRC)Support Program under Grant IITP-2025-RS-2020-II201789in part by the Artificial Intelligence Convergence Innovation Human Resources Development Supervised by the Institute of Information&Communications Technology Planning&Evaluation(IITP)under Grant IITP-2025-RS-2023-00254592.
文摘Weed management plays a crucial role in increasing crop yields.Semantic segmentation,which classifies each pixel in an image captured by a camera into categories such as crops,weeds,and background,is a widely used method in this context.However,conventional semantic segmentation methods rely solely on pixel information within the camera's field of view(FOV),hindering their ability to detect weeds outside the visible area.This limitation can lead to incomplete weed removal and inefficient herbicide application.Incorporating information beyond the FOV in crop and weed segmentation is therefore essential for effective herbicide usage.Nevertheless,existing research on crop and weed segmentation has largely overlooked this limitation.To address this issue,we propose the knowledge distillation-based outpainting and semantic segmentation network(KDOSS-Net)for crop and weed images,a novel framework that enhances segmentation accuracy by leveraging information beyond the FOV.KDOSS-Net consists of two parts:the object prediction-guided outpainting and semantic segmentation network(OPOSS-Net),which serves as the teacher model by restoring areas outside the FOV and performing semantic segmentation,and the semantic segmentation without outpainting network(SSWO-Net),which serves as the student model,directly performing segmentation without outpainting.Through knowledge distillation(KD),the student model learns from the teacher's outputs,which results in a lightweight yet highly accurate segmentation network that is suitable for deployment on agricultural robots with limited computing power.Experiments on three public datasets-Rice seedling and weed,CWFID,and BoniRob-yielded mean intersection over union(mIOU)scores of 0.6315,0.7101,and 0.7524,respectively.These results demonstrate that KDOSS-Net achieves higher accuracy than existing state-of-the-art(SOTA)segmentation models while significantly reducing computational overhead.Furthermore,the weed information extracted using our method is automatically linked as input to the open-source large language and vision assistant(LLaVA),enabling the development of a system that recommends optimal herbicide strategies tailored to the detected weed class.
基金supported by the National Natural Science Foundation for Young Scientists of China Award(No.62106289).
文摘Semantic image synthesis aims to generate highquality images given semantic conditions,i.e.,segmentation masks and style reference images.Existing methods widely adopt generative adversarial networks(GANs).GANs take all conditional inputs and directly synthesize images in a single forward step.In this paper,semantic image synthesis is treated as an image denoising task and is handled with a novel image-to-image diffusion model(IIDM).
基金supported by the Promotion Plan of Jiangsu Association for Science and Technology(TJ215039)the Research Foundation of Nanjing University of Posts and Telecommunications(NY219076)。
文摘Aiming to solve the poor performance of low illumination enhancement algorithms on uneven illumination images,a low-light image enhancement(LIME)algorithm based on a residual network was proposed.The algorithm constructs a deep network that uses residual modules to extract image feature information and semantic modules to extract image semantic information from different levels.Moreover,a composite loss function was also designed for the process of low illumination image enhancement,which dynamically evaluated the loss of an enhanced image from three factors of color,structure,and gradient.It ensures that the model can correctly enhance the image features according to the image semantics,so that the enhancement results are more in line with the human visual experience.Experimental results show that compared with the state-of-the-art algorithms,the semantic-driven residual low-light network(SRLLN)can effectively improve the quality of low illumination images,and achieve better subjective and objective evaluation indexes on different types of images.
基金The work was supported by the European Commission’s H2020 CANDELA project under Grant Agreement No.776193.
文摘The increased number of free and open Sentinel satellite images has led to new applications of these data.Among them is the systematic classification of land cover/use types based on patterns of settlements or agriculture recorded by these images,in particular,the identification and quantification of their temporal changes.In this paper,we will present guidelines and practical examples of how to obtain rapid and reliable image patch labelling results and their validation based on data mining techniques for detecting these temporal changes,and presenting these as classification maps and/or statistical analytics.This represents a new systematic validation approach for semantic image content verification.We will focus on a number of different scenarios proposed by the user community using Sentinel data.From a large number of potential use cases,we selected three main cases,namely forest monitoring,flood monitoring,and macro-economics/urban monitoring.
基金This work was supported by the National Science Fund for Excellent Young Scholars (61622214), the National Natural Science Foundation of China (Grant Nos. 61702565 and 61622214), Guangdong Natural Science Foundation Project for Research Teams (2017A030312006), and was also sponsored by CCF-Tencent Open Research Fund.
文摘Semantic image parsing, which refers to the pro- cess of decomposing images into semantic regions and constructing the structure representation of the input, has re- cently aroused widespread interest in the field of computer vision. The recent application of deep representation learning has driven this field into a new stage of development. In this paper, we summarize three aspects of the progress of research on semantic image parsing, i.e., category-level semantic segmentation, instance-level semantic segmentation, and beyond segmentation. Specifically, we first review the general frameworks for each task and introduce the relevant variants. The advantages and limitations of each method are also discussed. Moreover, we present a comprehensive comparison of different benchmark datasets and evaluation metrics. Finally, we explore the future trends and challenges of semantic image parsing.
基金supported in part by the National Natural Science Foundation of China(Nos.61502094 and 61402099)Natural Science Foundation of Heilongjiang Province of China(Nos.F2016002 and F2015020)
文摘The explosive increase in the number of images on the Internet has brought with it the great challenge of how to effectively index, retrieve, and organize these resources. Assigning proper tags to the visual content is key to the success of many applications such as image retrieval and content mining. Although recent years have witnessed many advances in image tagging, these methods have limitations when applied to high-quality and large-scale training data that are expensive to obtain. In this paper, we propose a novel semantic neighbor learning method based on user-contributed social image datasets that can be acquired from the Web's inexhaustible social image content. In contrast to existing image tagging approaches that rely on high-quality image-tag supervision, we acquire weak supervision of our neighbor learning method by progressive neighborhood retrieval from noisy and diverse user-contributed image collections. The retrieved neighbor images are not only visually alike and partially correlated but also semantically related. We offer a step-by-step and easy-to-use implementation for the proposed method. Extensive experimentation on several datasets demonstrates that the performance of the proposed method significantly outperforms others.
基金This research was supported by the National Natural Science Foundation of China(41977064)the Fundamental Research Funds for the Central Universities(2452021158+1 种基金2452021036)the 111 Project of the Ministry of Education and the State Administration of Foreign Experts Affairs(B12007)。
文摘Ephemeral gullies are widely distributed in the hilly and gully region of the Loess Plateau and play a unique role in the slope gully erosion system.Rapid and accurate identification of ephemeral gullies impacts the distribution law and development trend of soil erosion on the Loess Plateau.Deep learning algorithms can quickly and accurately process large data samples that recognize ephemeral gullies from remote sensing images.Here,we investigated ephemeral gullies in the Zhoutungou watershed in the hilly and gully region of the Loess Plateau in China using satellite and unmanned aerial vehicle images and combined a deep learning image semantic segmentation model to realize automatic recognition and feature extraction.Using Accuracy,Precision,Recall,F1value,and AUC,we compared the ephemeral gully recognition results and accuracy evaluation of U-Net,R2U-Net,and SegNet image semantic segmentation models.The SegNet model was ranked first,followed by the R2U-Net and U-Net models,for ephemeral gully recognition in the hilly and gully region of the Loess Plateau.The ephemeral gully length and width between predicted and measured values had RMSE values of 6.78 m and 0.50 m,respectively,indicating that the model has an excellent recognition effect.This study identified a fast and accurate method for ephemeral gully recognition in the hilly and gully region of the Loess Plateau based on remote sensing images to provide an academic reference and practical guidance for soil erosion monitoring and slope and gully management in the Loess Plateau region.
文摘Aiming at the convergence between Earth observation(EO)Big Data and Artificial General Intelligence(AGI),this two-part paper identifies an innovative,but realistic EO optical sensory imagederived semantics-enriched Analysis Ready Data(ARD)productpair and process gold standard as linchpin for success of a new notion of Space Economy 4.0.To be implemented in operational mode at the space segment and/or midstream segment by both public and private EO big data providers,it is regarded as necessarybut-not-sufficient“horizontal”(enabling)precondition for:(I)Transforming existing EO big raster-based data cubes at the midstream segment,typically affected by the so-called data-rich information-poor syndrome,into a new generation of semanticsenabled EO big raster-based numerical data and vector-based categorical(symbolic,semi-symbolic or subsymbolic)information cube management systems,eligible for semantic content-based image retrieval and semantics-enabled information/knowledge discovery.(II)Boosting the downstream segment in the development of an ever-increasing ensemble of“vertical”(deep and narrow,user-specific and domain-dependent)value–adding information products and services,suitable for a potentially huge worldwide market of institutional and private end-users of space technology.For the sake of readability,this paper consists of two parts.In the present Part 1,first,background notions in the remote sensing metascience domain are critically revised for harmonization across the multidisciplinary domain of cognitive science.In short,keyword“information”is disambiguated into the two complementary notions of quantitative/unequivocal information-as-thing and qualitative/equivocal/inherently ill-posed information-as-data-interpretation.Moreover,buzzword“artificial intelligence”is disambiguated into the two better-constrained notions of Artificial Narrow Intelligence as part-without-inheritance-of AGI.Second,based on a betterdefined and better-understood vocabulary of multidisciplinary terms,existing EO optical sensory image-derived Level 2/ARD products and processes are investigated at the Marr five levels of understanding of an information processing system.To overcome their drawbacks,an innovative,but realistic EO optical sensory image-derived semantics-enriched ARD product-pair and process gold standard is proposed in the subsequent Part 2.
基金ASAP 16 project call,project title:SemantiX-A cross-sensor semantic EO data cube to open and leverage essential climate variables with scientists and the public,Grant ID:878939ASAP 17 project call,project title:SIMS-Soil sealing identification and monitoring system,Grant ID:885365.
文摘Aiming at the convergence between Earth observation(EO)Big Data and Artificial General Intelligence(AGI),this paper consists of two parts.In the previous Part 1,existing EO optical sensory imagederived Level 2/Analysis Ready Data(ARD)products and processes are critically compared,to overcome their lack of harmonization/standardization/interoperability and suitability in a new notion of Space Economy 4.0.In the present Part 2,original contributions comprise,at the Marr five levels of system understanding:(1)an innovative,but realistic EO optical sensory image-derived semantics-enriched ARD co-product pair requirements specification.First,in the pursuit of third-level semantic/ontological interoperability,a novel ARD symbolic(categorical and semantic)co-product,known as Scene Classification Map(SCM),adopts an augmented Cloud versus Not-Cloud taxonomy,whose Not-Cloud class legend complies with the standard fully-nested Land Cover Classification System’s Dichotomous Phase taxonomy proposed by the United Nations Food and Agriculture Organization.Second,a novel ARD subsymbolic numerical co-product,specifically,a panchromatic or multispectral EO image whose dimensionless digital numbers are radiometrically calibrated into a physical unit of radiometric measure,ranging from top-of-atmosphere reflectance to surface reflectance and surface albedo values,in a five-stage radiometric correction sequence.(2)An original ARD process requirements specification.(3)An innovative ARD processing system design(architecture),where stepwise SCM generation and stepwise SCM-conditional EO optical image radiometric correction are alternated in sequence.(4)An original modular hierarchical hybrid(combined deductive and inductive)computer vision subsystem design,provided with feedback loops,where software solutions at the Marr two shallowest levels of system understanding,specifically,algorithm and implementation,are selected from the scientific literature,to benefit from their technology readiness level as proof of feasibility,required in addition to proven suitability.To be implemented in operational mode at the space segment and/or midstream segment by both public and private EO big data providers,the proposed EO optical sensory image-derived semantics-enriched ARD product-pair and process reference standard is highlighted as linchpin for success of a new notion of Space Economy 4.0.