This study explored the land use/land cover(LULC)separability by the machine-generated and user-generated Flickr photo tags(i.e.the auto-tags and the user-tags,respectively),based on an authoritative LULC dataset for ...This study explored the land use/land cover(LULC)separability by the machine-generated and user-generated Flickr photo tags(i.e.the auto-tags and the user-tags,respectively),based on an authoritative LULC dataset for San Diego County in the United States.Ten types of LULCs were derived from the authoritative dataset.It was observed that certain types of the reclassified LULCs had abundant tags(e.g.the parks)or a high tag density(e.g.the commercial lands),compared with the less populated ones(e.g.the agricultural lands).Certain highly weighted terms of the tags derived based on a term frequency–inverse document frequency weighting scheme were helpful for identifying specific types of the LULCs,especially for the commercial recreation lands(e.g.the zoos).However,given the 10 sets of tags retrieved from the corresponding 10 types of LULCs,one set of tags(all the tags located at one specific type of the LULCs)could not fully delineate the corresponding LULC due to semantic overlaps,according to a latent semantic analysis.展开更多
This research investigated the impact of social-networking service posts on the formation of image structure of cities,focusing on the spatial distribution of images and their content similarity.It aimed to delineate ...This research investigated the impact of social-networking service posts on the formation of image structure of cities,focusing on the spatial distribution of images and their content similarity.It aimed to delineate the image structure of cities created by numerous users,moving beyond traditional qualitative methods towards a more quantitative and objective approach with big data.Taking central Tokyo as an example,this study extracted geotagged image data of 33 major railway station areas from Flickr’s API(Application Programming Interface).Four coverage types of viewpoint distribution,namely planar,intersecting linear,linear,and nodal,were identified,reflecting the unique urban structures respectively.Further investigation of the image contents,primarily consisting of“urban landscape”and“landscape/street trees,”showed that such contents significantly influenced the formationof the image structure of cities.The study concluded that asthe number of photo posts increased and the representativeviewpoints concentrated,the digital information received by usersbecame more homogeneous,leading to strongly stereotypedimages of urban landscapes.These findings highlight the role ofsocial networking services in shaping perceptions of the urbanenvironment and provide insights into the image structure of citiesas formed by digital information.展开更多
Drone technology opens the door to major changes and opportunities in our society.But this technology,like many others,needs to be administered and regulated to prevent potential harm to the public.Therefore,national ...Drone technology opens the door to major changes and opportunities in our society.But this technology,like many others,needs to be administered and regulated to prevent potential harm to the public.Therefore,national and local governments around the world established regulations for operating drones,which bans drone use from specific locations or limits their operation to qualified drone pilots only.This study reviews the types of restrictions on drone use that are specified in federal drone regulations for the US,the UK,and France,and in state regulations for the US.The study also maps restricted areas and assesses compliance with these regulations by analyzing the spatial contribution patterns to three crowd-sourced drone portals,namely SkyPixel,Flickr,and DroneSpot,relative to restricted areas.The analysis is performed both at the national level and at the state/regional level within each of the three countries,where statistical tests are conducted to compare compliance rates between the three drone portals.This study provides new insight into drone users’awareness of and compliance with drone regulations.This can help governments to tailor information campaigns for increased awareness of drone regulations among drone users and to determine where increased control and enforcement of drone regulations is necessary.展开更多
The process of generating descriptive captions for images has witnessed significant advancements in last years,owing to the progress in deep learning techniques.Despite significant advancements,the task of thoroughly ...The process of generating descriptive captions for images has witnessed significant advancements in last years,owing to the progress in deep learning techniques.Despite significant advancements,the task of thoroughly grasping image content and producing coherent,contextually relevant captions continues to pose a substantial challenge.In this paper,we introduce a novel multimodal method for image captioning by integrating three powerful deep learning architectures:YOLOv8(You Only Look Once)for robust object detection,EfficientNetB7 for efficient feature extraction,and Transformers for effective sequence modeling.Our proposed model combines the strengths of YOLOv8 in detecting objects,the superior feature representation capabilities of EfficientNetB7,and the contextual understanding and sequential generation abilities of Transformers.We conduct extensive experiments on standard benchmark datasets to evaluate the effectiveness of our approach,demonstrating its ability to generate informative and semantically rich captions for diverse images.The experimental results showcase the synergistic benefits of integrating YOLOv8,EfficientNetB7,and Transformers in advancing the state-of-the-art in image captioning tasks.The proposed multimodal approach has yielded impressive outcomes,generating informative and semantically rich captions for a diverse range of images.By combining the strengths of YOLOv8,EfficientNetB7,and Transformers,the model has achieved state-of-the-art results in image captioning tasks.The significance of this approach lies in its ability to address the challenging task of generating coherent and contextually relevant captions while achieving a comprehensive understanding of image content.The integration of three powerful deep learning architectures demonstrates the synergistic benefits of multimodal fusion in advancing the state-of-the-art in image captioning.Furthermore,this approach has a profound impact on the field,opening up new avenues for research in multimodal deep learning and paving the way for more sophisticated and context-aware image captioning systems.These systems have the potential to make significant contributions to various fields,encompassing human-computer interaction,computer vision and natural language processing.展开更多
基金This work is supported by the European Union LandSense project with the project title“A Citizen Observatory and Innovation Marketplace for Land Use and Land Cover Monitoring”,instrument Horizon 2020 and call identifier SC5-17-2015,demonstrating the concept of citizen observatories as an innovation action.
文摘This study explored the land use/land cover(LULC)separability by the machine-generated and user-generated Flickr photo tags(i.e.the auto-tags and the user-tags,respectively),based on an authoritative LULC dataset for San Diego County in the United States.Ten types of LULCs were derived from the authoritative dataset.It was observed that certain types of the reclassified LULCs had abundant tags(e.g.the parks)or a high tag density(e.g.the commercial lands),compared with the less populated ones(e.g.the agricultural lands).Certain highly weighted terms of the tags derived based on a term frequency–inverse document frequency weighting scheme were helpful for identifying specific types of the LULCs,especially for the commercial recreation lands(e.g.the zoos).However,given the 10 sets of tags retrieved from the corresponding 10 types of LULCs,one set of tags(all the tags located at one specific type of the LULCs)could not fully delineate the corresponding LULC due to semantic overlaps,according to a latent semantic analysis.
文摘This research investigated the impact of social-networking service posts on the formation of image structure of cities,focusing on the spatial distribution of images and their content similarity.It aimed to delineate the image structure of cities created by numerous users,moving beyond traditional qualitative methods towards a more quantitative and objective approach with big data.Taking central Tokyo as an example,this study extracted geotagged image data of 33 major railway station areas from Flickr’s API(Application Programming Interface).Four coverage types of viewpoint distribution,namely planar,intersecting linear,linear,and nodal,were identified,reflecting the unique urban structures respectively.Further investigation of the image contents,primarily consisting of“urban landscape”and“landscape/street trees,”showed that such contents significantly influenced the formationof the image structure of cities.The study concluded that asthe number of photo posts increased and the representativeviewpoints concentrated,the digital information received by usersbecame more homogeneous,leading to strongly stereotypedimages of urban landscapes.These findings highlight the role ofsocial networking services in shaping perceptions of the urbanenvironment and provide insights into the image structure of citiesas formed by digital information.
文摘Drone technology opens the door to major changes and opportunities in our society.But this technology,like many others,needs to be administered and regulated to prevent potential harm to the public.Therefore,national and local governments around the world established regulations for operating drones,which bans drone use from specific locations or limits their operation to qualified drone pilots only.This study reviews the types of restrictions on drone use that are specified in federal drone regulations for the US,the UK,and France,and in state regulations for the US.The study also maps restricted areas and assesses compliance with these regulations by analyzing the spatial contribution patterns to three crowd-sourced drone portals,namely SkyPixel,Flickr,and DroneSpot,relative to restricted areas.The analysis is performed both at the national level and at the state/regional level within each of the three countries,where statistical tests are conducted to compare compliance rates between the three drone portals.This study provides new insight into drone users’awareness of and compliance with drone regulations.This can help governments to tailor information campaigns for increased awareness of drone regulations among drone users and to determine where increased control and enforcement of drone regulations is necessary.
基金funded by Researchers Supporting Project number(RSPD2024R698),King Saud University,Riyadh,Saudi Arabia.
文摘The process of generating descriptive captions for images has witnessed significant advancements in last years,owing to the progress in deep learning techniques.Despite significant advancements,the task of thoroughly grasping image content and producing coherent,contextually relevant captions continues to pose a substantial challenge.In this paper,we introduce a novel multimodal method for image captioning by integrating three powerful deep learning architectures:YOLOv8(You Only Look Once)for robust object detection,EfficientNetB7 for efficient feature extraction,and Transformers for effective sequence modeling.Our proposed model combines the strengths of YOLOv8 in detecting objects,the superior feature representation capabilities of EfficientNetB7,and the contextual understanding and sequential generation abilities of Transformers.We conduct extensive experiments on standard benchmark datasets to evaluate the effectiveness of our approach,demonstrating its ability to generate informative and semantically rich captions for diverse images.The experimental results showcase the synergistic benefits of integrating YOLOv8,EfficientNetB7,and Transformers in advancing the state-of-the-art in image captioning tasks.The proposed multimodal approach has yielded impressive outcomes,generating informative and semantically rich captions for a diverse range of images.By combining the strengths of YOLOv8,EfficientNetB7,and Transformers,the model has achieved state-of-the-art results in image captioning tasks.The significance of this approach lies in its ability to address the challenging task of generating coherent and contextually relevant captions while achieving a comprehensive understanding of image content.The integration of three powerful deep learning architectures demonstrates the synergistic benefits of multimodal fusion in advancing the state-of-the-art in image captioning.Furthermore,this approach has a profound impact on the field,opening up new avenues for research in multimodal deep learning and paving the way for more sophisticated and context-aware image captioning systems.These systems have the potential to make significant contributions to various fields,encompassing human-computer interaction,computer vision and natural language processing.