Artificial soft actuators,featured with non-equilibrium internal circumstance and fast,programmable shape transformations,have attracted strong research interest recently due to their flexibility,highly controllable,a...Artificial soft actuators,featured with non-equilibrium internal circumstance and fast,programmable shape transformations,have attracted strong research interest recently due to their flexibility,highly controllable,and designability.However,wireless soft actuators,achieving the locomotion on different large slopes with multiple energy conversion,have been rarely reported.Herein,we create a asymmetric bilayer strategy to construct autonomous soft crawler via“breathing”moisture to motivate the mechanical deformation.The soft crawlers present conspicuous performances including periodic tumbler locomotion predicted via improved Timoshenko’s equation,multiple reversible shape-morphing(circle,helix,despiralization,etc.)determined by their fiber orientation,controlled drive mode(front drive and rear drive)and rapid climb speed(4.76 cm/min)at wide slope angles.Through architecture design,they can be series-wound or shunt-wound to construct multijoint complex actuators.Besides climbing,a intelligent soft ring-pull with admirable cycle performance for preventing overheating or something untouchable,has been proposed.The soft crawlers also realize multiple energy conversion to be actuated by light irradiation.We envision that this soft crawler system has an enormous potential in intelligent machine,microscopic diagnosis and treatment,biosensing,energy harvesting and conversion.展开更多
Soft robotic crawlers have limited payload capacity and crawling speed.This study proposes a high-performance inchworm-like modular robotic crawler based on fluidic prestressed composite(FPC)actuators.The FPC actuator...Soft robotic crawlers have limited payload capacity and crawling speed.This study proposes a high-performance inchworm-like modular robotic crawler based on fluidic prestressed composite(FPC)actuators.The FPC actuator is precurved and a pneumatic source is used to flatten it,requiring no energy cost to maintain the equilibrium curved shape.Pressurizing and depressurizing the actuators generate alternating stretching and bending motions of the actuators,achieving the crawling motion of the robotic crawler.Multi-modal locomotion(crawling,turning,and pipe climbing)is achieved by modular reconfiguration and gait design.An analytical kinematic model is proposed to characterize the quasi-static curvature and step size of a single-module crawler.Multiple configurations of robotic crawlers are fabricated to demonstrate the crawling ability of the proposed design.A set of systematic experiments are set up and conducted to understand how crawler responses vary as a function of FPC prestrains,input pressures,and actuation frequencies.As per the experiments,the maximum carrying load ratio(carrying load divided by robot weight)is found to be 22.32,and the highest crawling velocity is 3.02 body length(BL)per second(392 mm/s).Multi-modal capabilities are demonstrated by reconfiguring three soft crawlers,including a matrix crawler robot crawling in amphibious environments,and an inching crawler turning at an angular velocity of 2/s,as well as earthworm-like crawling robots climbing a 20 inclination slope and pipe.展开更多
Web crawlers have been misused for several malicious purposes such as downloading server data without permission from the website administrator.Moreover,armoured crawlers are evolving against new anti-crawler mechanis...Web crawlers have been misused for several malicious purposes such as downloading server data without permission from the website administrator.Moreover,armoured crawlers are evolving against new anti-crawler mechanisms in the arm races between crawler developers and crawler defenders.In this paper,based on one observation that normal users and malicious crawlers have different short-term and long-term download behaviours,we develop a new anti-crawler mechanism called PathMarker to detect and constrain persistent distributed crawlers.By adding a marker to each Uniform Resource Locator(URL),we can trace the page that leads to the access of this URL and the user identity who accesses this URL.With this supporting information,we can not only perform more accurate heuristic detection using the path related features,but also develop a Support Vector Machine based machine learning detection model to distinguish malicious crawlers from normal users via inspecting their different patterns of URL visiting paths and URL visiting timings.In addition to effectively detecting crawlers at the earliest stage,PathMarker can dramatically suppress the scraping efficiency of crawlers before they are detected.We deploy our approach on an online forum website,and the evaluation results show that PathMarker can quickly capture all 6 open-source and in-house crawlers,plus two external crawlers(i.e.,Googlebots and Yahoo Slurp).展开更多
Web crawlers have been misused for several malicious purposes such as downloading server data without permission from the website administrator.Moreover,armoured crawlers are evolving against new anti-crawler mechanis...Web crawlers have been misused for several malicious purposes such as downloading server data without permission from the website administrator.Moreover,armoured crawlers are evolving against new anti-crawler mechanisms in the arm races between crawler developers and crawler defenders.In this paper,based on one observation that normal users and malicious crawlers have different short-term and long-term download behaviours,we develop a new anti-crawler mechanism called PathMarker to detect and constrain persistent distributed crawlers.By adding a marker to each Uniform Resource Locator(URL),we can trace the page that leads to the access of this URL and the user identity who accesses this URL.With this supporting information,we can not only perform more accurate heuristic detection using the path related features,but also develop a Support Vector Machine based machine learning detection model to distinguish malicious crawlers from normal users via inspecting their different patterns of URL visiting paths and URL visiting timings.In addition to effectively detecting crawlers at the earliest stage,PathMarker can dramatically suppress the scraping efficiency of crawlers before they are detected.We deploy our approach on an online forum website,and the evaluation results show that PathMarker can quickly capture all 6 open-source and in-house crawlers,plus two external crawlers(i.e.,Googlebots and Yahoo Slurp).展开更多
Green consumption(GC)are crucial for achieving the SustainableDevelopmentGoals(SDGs).However,few studies have explored public attitudes toward GC using social media data,missing potential public concerns captured thro...Green consumption(GC)are crucial for achieving the SustainableDevelopmentGoals(SDGs).However,few studies have explored public attitudes toward GC using social media data,missing potential public concerns captured through big data.To address this gap,this study collects and analyzes public attention toward GC using web crawler technology.Based on the data from Sina Weibo,we applied RoBERTa,an advanced NLP model based on transformer architecture,to conduct fine-grained sentiment analysis of the public’s attention,attitudes and hot topics on GC,demonstrating the potential of deep learning methods in capturing dynamic and contextual emotional shifts across time and regions.Among the sample(N=188,509),53.91% expressed a positive attitude,with variation across different times and regions.Temporally,public interest in GC has shown an annual growth rate of 30.23%,gradually shifting fromfulfilling basic needs to prioritizing entertainment consumption.Spatially,GC is most prevalent in the southeast coastal regions of China,with Beijing ranking first across five evaluated domains.Individuals and government-affiliated accounts play a key role in public discussions on social networks,accounting for 45.89% and 30.01% of user reviews,respectively.A significant positive correlation exists between economic development and public attention to GC,as indicated by a Pearson correlation coefficient of 0.55.Companies,in particular,exhibit cautious behavior in the early stages of green product adoption,prioritizing profitability before making substantial investments.These findings provide valuable insights into the evolving public perception of GC,contributing to the development of more effective environmental policies in China.展开更多
The data collection and web crawling course has a lot of theoretical knowledge and strong practicality.Traditional teaching methods are no longer sufficient to meet teaching needs.Based on the characteristics of the c...The data collection and web crawling course has a lot of theoretical knowledge and strong practicality.Traditional teaching methods are no longer sufficient to meet teaching needs.Based on the characteristics of the course,this article constructs a mixed teaching environment based on“Learning Pass+Hongya Platform+Offline Course,”integrates teaching resource libraries and ideological and political cases,and develops a suitable evaluation system to cultivate students’innovative and critical thinking abilities,stimulate their learning initiative,improve their teamwork ability,and enhance their professional level and data literacy.展开更多
Focused crawling is an important technique for topical resource discovery on the Web.The key issue in focused crawling is to prioritize uncrawled uniform resource locators(URLs) in the frontier to focus the crawling o...Focused crawling is an important technique for topical resource discovery on the Web.The key issue in focused crawling is to prioritize uncrawled uniform resource locators(URLs) in the frontier to focus the crawling on relevant pages.Traditional focused crawlers mainly rely on content analysis.Link-based techniques are not effectively exploited despite their usefulness.In this paper,we propose a new frontier prioritizing algorithm,namely the on-line topical importance estimation(OTIE) algorithm.OTIE combines link-and content-based analysis to evaluate the priority of an uncrawled URL in the frontier.We performed real crawling experiments over 30 topics selected from the Open Directory Project(ODP) and compared harvest rate and target recall of the four crawling algorithms:breadth-first,link-context-prediction,on-line page importance computation(OPIC) and our OTIE.Experimental results showed that OTIE significantly outperforms the other three algorithms on the average target recall while maintaining an acceptable harvest rate.Moreover,OTIE is much faster than the traditional focused crawling algorithm.展开更多
Focused crawlers are important tools to support applications such as specialized Web portals, online searching, and Web search engines. A topic driven crawler chooses the best URLs and relevant pages to pursue during ...Focused crawlers are important tools to support applications such as specialized Web portals, online searching, and Web search engines. A topic driven crawler chooses the best URLs and relevant pages to pursue during Web crawling. It is difficult to deal with irrelevant pages. This paper presents a novel focused crawler framework. In our focused crawler, we propose a method to overcome some of the limitations of dealing with the irrelevant pages. We also introduce the implementation of our focused crawler and present some important metrics and an evaluation function for ranking pages relevance. The experimental result shows that our crawler can obtain more "important" pages and has a high precision and recall value.展开更多
As data grows in size,search engines face new challenges in extracting more relevant content for users’searches.As a result,a number of retrieval and ranking algorithms have been employed to ensure that the results a...As data grows in size,search engines face new challenges in extracting more relevant content for users’searches.As a result,a number of retrieval and ranking algorithms have been employed to ensure that the results are relevant to the user’s requirements.Unfortunately,most existing indexes and ranking algo-rithms crawl documents and web pages based on a limited set of criteria designed to meet user expectations,making it impossible to deliver exceptionally accurate results.As a result,this study investigates and analyses how search engines work,as well as the elements that contribute to higher ranks.This paper addresses the issue of bias by proposing a new ranking algorithm based on the PageRank(PR)algorithm,which is one of the most widely used page ranking algorithms We pro-pose weighted PageRank(WPR)algorithms to test the relationship between these various measures.The Weighted Page Rank(WPR)model was used in three dis-tinct trials to compare the rankings of documents and pages based on one or more user preferences criteria.Thefindings of utilizing the Weighted Page Rank model showed that using multiple criteria to rankfinal pages is better than using only one,and that some criteria had a greater impact on ranking results than others.展开更多
The cyber-criminal compromises end-hosts(bots)to configure a network of bots(botnet).The cyber-criminals are also looking for an evolved architecture that makes their techniques more resilient and stealthier such as P...The cyber-criminal compromises end-hosts(bots)to configure a network of bots(botnet).The cyber-criminals are also looking for an evolved architecture that makes their techniques more resilient and stealthier such as Peer-to-Peer(P2P)networks.The P2P botnets leverage the privileges of the decentralized nature of P2P networks.Consequently,the P2P botnets exploit the resilience of this architecture to be arduous against take-down procedures.Some P2P botnets are smarter to be stealthy in their Commandand-Control mechanisms(C2)and elude the standard discovery mechanisms.Therefore,the other side of this cyberwar is the monitor.The P2P botnet monitoring is an exacting mission because the monitoring must care about many aspects simultaneously.Some aspects pertain to the existing monitoring approaches,some pertain to the nature of P2P networks,and some to counter the botnets,i.e.,the anti-monitoring mechanisms.All these challenges should be considered in P2P botnet monitoring.To begin with,this paper provides an anatomy of P2P botnets.Thereafter,this paper exhaustively reviews the existing monitoring approaches of P2P botnets and thoroughly discusses each to reveal its advantages and disadvantages.In addition,this paper groups the monitoring approaches into three groups:passive,active,and hybrid monitoring approaches.Furthermore,this paper also discusses the functional and non-functional requirements of advanced monitoring.In conclusion,this paper ends by epitomizing the challenges of various aspects and gives future avenues for better monitoring of P2P botnets.展开更多
基金financially supported by the National Natural Science Foundation of China(Nos.22001175,51973118,22175121 and 52003160)Key-Area Research and Development Program of Guangdong Province(Nos.2019B010929002 and 2019B010941001)+3 种基金the Natural Science Foundation of Guangdong Province(No.2020A1515010644)the Program for Guangdong Introducing Innovative and Enterpreneurial Teams(No.2019ZT08C642)Shenzhen Science and Technology Program(Nos.JCYJ20210324095412035,JCYJ20190808113005643,JCYJ20170818093832350 and JCYJ20180507184711069)the start-up fund of Shenzhen University(No.000002110820)。
文摘Artificial soft actuators,featured with non-equilibrium internal circumstance and fast,programmable shape transformations,have attracted strong research interest recently due to their flexibility,highly controllable,and designability.However,wireless soft actuators,achieving the locomotion on different large slopes with multiple energy conversion,have been rarely reported.Herein,we create a asymmetric bilayer strategy to construct autonomous soft crawler via“breathing”moisture to motivate the mechanical deformation.The soft crawlers present conspicuous performances including periodic tumbler locomotion predicted via improved Timoshenko’s equation,multiple reversible shape-morphing(circle,helix,despiralization,etc.)determined by their fiber orientation,controlled drive mode(front drive and rear drive)and rapid climb speed(4.76 cm/min)at wide slope angles.Through architecture design,they can be series-wound or shunt-wound to construct multijoint complex actuators.Besides climbing,a intelligent soft ring-pull with admirable cycle performance for preventing overheating or something untouchable,has been proposed.The soft crawlers also realize multiple energy conversion to be actuated by light irradiation.We envision that this soft crawler system has an enormous potential in intelligent machine,microscopic diagnosis and treatment,biosensing,energy harvesting and conversion.
基金supported by the National Natural Science Foundation of China under Grant No.62203174the Guangzhou Municipal Science and Technology Project under Grant No.202201010179.
文摘Soft robotic crawlers have limited payload capacity and crawling speed.This study proposes a high-performance inchworm-like modular robotic crawler based on fluidic prestressed composite(FPC)actuators.The FPC actuator is precurved and a pneumatic source is used to flatten it,requiring no energy cost to maintain the equilibrium curved shape.Pressurizing and depressurizing the actuators generate alternating stretching and bending motions of the actuators,achieving the crawling motion of the robotic crawler.Multi-modal locomotion(crawling,turning,and pipe climbing)is achieved by modular reconfiguration and gait design.An analytical kinematic model is proposed to characterize the quasi-static curvature and step size of a single-module crawler.Multiple configurations of robotic crawlers are fabricated to demonstrate the crawling ability of the proposed design.A set of systematic experiments are set up and conducted to understand how crawler responses vary as a function of FPC prestrains,input pressures,and actuation frequencies.As per the experiments,the maximum carrying load ratio(carrying load divided by robot weight)is found to be 22.32,and the highest crawling velocity is 3.02 body length(BL)per second(392 mm/s).Multi-modal capabilities are demonstrated by reconfiguring three soft crawlers,including a matrix crawler robot crawling in amphibious environments,and an inching crawler turning at an angular velocity of 2/s,as well as earthworm-like crawling robots climbing a 20 inclination slope and pipe.
基金This work is supported by U.S.Office of Naval Research under grants N00014-16-1-3214 and N00014-16-1-3216.
文摘Web crawlers have been misused for several malicious purposes such as downloading server data without permission from the website administrator.Moreover,armoured crawlers are evolving against new anti-crawler mechanisms in the arm races between crawler developers and crawler defenders.In this paper,based on one observation that normal users and malicious crawlers have different short-term and long-term download behaviours,we develop a new anti-crawler mechanism called PathMarker to detect and constrain persistent distributed crawlers.By adding a marker to each Uniform Resource Locator(URL),we can trace the page that leads to the access of this URL and the user identity who accesses this URL.With this supporting information,we can not only perform more accurate heuristic detection using the path related features,but also develop a Support Vector Machine based machine learning detection model to distinguish malicious crawlers from normal users via inspecting their different patterns of URL visiting paths and URL visiting timings.In addition to effectively detecting crawlers at the earliest stage,PathMarker can dramatically suppress the scraping efficiency of crawlers before they are detected.We deploy our approach on an online forum website,and the evaluation results show that PathMarker can quickly capture all 6 open-source and in-house crawlers,plus two external crawlers(i.e.,Googlebots and Yahoo Slurp).
基金supported by U.S.Office of Naval Research under grants N00014-16-1-3214 and N00014-16-1-3216.
文摘Web crawlers have been misused for several malicious purposes such as downloading server data without permission from the website administrator.Moreover,armoured crawlers are evolving against new anti-crawler mechanisms in the arm races between crawler developers and crawler defenders.In this paper,based on one observation that normal users and malicious crawlers have different short-term and long-term download behaviours,we develop a new anti-crawler mechanism called PathMarker to detect and constrain persistent distributed crawlers.By adding a marker to each Uniform Resource Locator(URL),we can trace the page that leads to the access of this URL and the user identity who accesses this URL.With this supporting information,we can not only perform more accurate heuristic detection using the path related features,but also develop a Support Vector Machine based machine learning detection model to distinguish malicious crawlers from normal users via inspecting their different patterns of URL visiting paths and URL visiting timings.In addition to effectively detecting crawlers at the earliest stage,PathMarker can dramatically suppress the scraping efficiency of crawlers before they are detected.We deploy our approach on an online forum website,and the evaluation results show that PathMarker can quickly capture all 6 open-source and in-house crawlers,plus two external crawlers(i.e.,Googlebots and Yahoo Slurp).
基金supported by the National Nature Foundation of China under Grants(No.72104108)the College Students’Innovation and Entrepreneurship Training Program(No.202410298155Y).
文摘Green consumption(GC)are crucial for achieving the SustainableDevelopmentGoals(SDGs).However,few studies have explored public attitudes toward GC using social media data,missing potential public concerns captured through big data.To address this gap,this study collects and analyzes public attention toward GC using web crawler technology.Based on the data from Sina Weibo,we applied RoBERTa,an advanced NLP model based on transformer architecture,to conduct fine-grained sentiment analysis of the public’s attention,attitudes and hot topics on GC,demonstrating the potential of deep learning methods in capturing dynamic and contextual emotional shifts across time and regions.Among the sample(N=188,509),53.91% expressed a positive attitude,with variation across different times and regions.Temporally,public interest in GC has shown an annual growth rate of 30.23%,gradually shifting fromfulfilling basic needs to prioritizing entertainment consumption.Spatially,GC is most prevalent in the southeast coastal regions of China,with Beijing ranking first across five evaluated domains.Individuals and government-affiliated accounts play a key role in public discussions on social networks,accounting for 45.89% and 30.01% of user reviews,respectively.A significant positive correlation exists between economic development and public attention to GC,as indicated by a Pearson correlation coefficient of 0.55.Companies,in particular,exhibit cautious behavior in the early stages of green product adoption,prioritizing profitability before making substantial investments.These findings provide valuable insights into the evolving public perception of GC,contributing to the development of more effective environmental policies in China.
基金supported by the Quality Engineering Project of Guangdong University of Science and Technology under Grant GKZLGC2024160。
文摘The data collection and web crawling course has a lot of theoretical knowledge and strong practicality.Traditional teaching methods are no longer sufficient to meet teaching needs.Based on the characteristics of the course,this article constructs a mixed teaching environment based on“Learning Pass+Hongya Platform+Offline Course,”integrates teaching resource libraries and ideological and political cases,and develops a suitable evaluation system to cultivate students’innovative and critical thinking abilities,stimulate their learning initiative,improve their teamwork ability,and enhance their professional level and data literacy.
基金Project (No.2007C23086) supported by the Science and Technology Plan of Zhejiang Province,China
文摘Focused crawling is an important technique for topical resource discovery on the Web.The key issue in focused crawling is to prioritize uncrawled uniform resource locators(URLs) in the frontier to focus the crawling on relevant pages.Traditional focused crawlers mainly rely on content analysis.Link-based techniques are not effectively exploited despite their usefulness.In this paper,we propose a new frontier prioritizing algorithm,namely the on-line topical importance estimation(OTIE) algorithm.OTIE combines link-and content-based analysis to evaluate the priority of an uncrawled URL in the frontier.We performed real crawling experiments over 30 topics selected from the Open Directory Project(ODP) and compared harvest rate and target recall of the four crawling algorithms:breadth-first,link-context-prediction,on-line page importance computation(OPIC) and our OTIE.Experimental results showed that OTIE significantly outperforms the other three algorithms on the average target recall while maintaining an acceptable harvest rate.Moreover,OTIE is much faster than the traditional focused crawling algorithm.
基金Supported by the National Natural Science Foun-dation of China (60373099)
文摘Focused crawlers are important tools to support applications such as specialized Web portals, online searching, and Web search engines. A topic driven crawler chooses the best URLs and relevant pages to pursue during Web crawling. It is difficult to deal with irrelevant pages. This paper presents a novel focused crawler framework. In our focused crawler, we propose a method to overcome some of the limitations of dealing with the irrelevant pages. We also introduce the implementation of our focused crawler and present some important metrics and an evaluation function for ranking pages relevance. The experimental result shows that our crawler can obtain more "important" pages and has a high precision and recall value.
文摘As data grows in size,search engines face new challenges in extracting more relevant content for users’searches.As a result,a number of retrieval and ranking algorithms have been employed to ensure that the results are relevant to the user’s requirements.Unfortunately,most existing indexes and ranking algo-rithms crawl documents and web pages based on a limited set of criteria designed to meet user expectations,making it impossible to deliver exceptionally accurate results.As a result,this study investigates and analyses how search engines work,as well as the elements that contribute to higher ranks.This paper addresses the issue of bias by proposing a new ranking algorithm based on the PageRank(PR)algorithm,which is one of the most widely used page ranking algorithms We pro-pose weighted PageRank(WPR)algorithms to test the relationship between these various measures.The Weighted Page Rank(WPR)model was used in three dis-tinct trials to compare the rankings of documents and pages based on one or more user preferences criteria.Thefindings of utilizing the Weighted Page Rank model showed that using multiple criteria to rankfinal pages is better than using only one,and that some criteria had a greater impact on ranking results than others.
基金This work was supported by the Ministry of Higher Education Malaysia’s Fundamental Research Grant Scheme under Grant FRGS/1/2021/ICT07/USM/03/1.
文摘The cyber-criminal compromises end-hosts(bots)to configure a network of bots(botnet).The cyber-criminals are also looking for an evolved architecture that makes their techniques more resilient and stealthier such as Peer-to-Peer(P2P)networks.The P2P botnets leverage the privileges of the decentralized nature of P2P networks.Consequently,the P2P botnets exploit the resilience of this architecture to be arduous against take-down procedures.Some P2P botnets are smarter to be stealthy in their Commandand-Control mechanisms(C2)and elude the standard discovery mechanisms.Therefore,the other side of this cyberwar is the monitor.The P2P botnet monitoring is an exacting mission because the monitoring must care about many aspects simultaneously.Some aspects pertain to the existing monitoring approaches,some pertain to the nature of P2P networks,and some to counter the botnets,i.e.,the anti-monitoring mechanisms.All these challenges should be considered in P2P botnet monitoring.To begin with,this paper provides an anatomy of P2P botnets.Thereafter,this paper exhaustively reviews the existing monitoring approaches of P2P botnets and thoroughly discusses each to reveal its advantages and disadvantages.In addition,this paper groups the monitoring approaches into three groups:passive,active,and hybrid monitoring approaches.Furthermore,this paper also discusses the functional and non-functional requirements of advanced monitoring.In conclusion,this paper ends by epitomizing the challenges of various aspects and gives future avenues for better monitoring of P2P botnets.