Nowadays,more and more Android developers prefer to seek help from Q&A website like Stack Overflow,despite the rich official documentation.Several researches have studied the limitations of the official applicatio...Nowadays,more and more Android developers prefer to seek help from Q&A website like Stack Overflow,despite the rich official documentation.Several researches have studied the limitations of the official application programming interface(API)documentations and proposed approaches to improve them.However,few of them digged into the requirements of the third-party developers to study this.In this work,we gain insight into this question from multidimensional perspectives of API developers and API users by a kind of cross-validation.We propose a hybrid approach,which combines manual inspection on artifacts and online survey on corresponding developers,to explore the different focus between these two types of stakeholders.In our work,we manually inspect 1000 posts and receive 319 questionnaires in total.Through the mutual verification of the inspection and survey process,we found that the users are more concerned with the usage of API,while the official documentation mainly provides functional description.Furthermore,we identified 9 flaws of the official documentation and summarized 12 aspects(from the content to the representation)for promotion to improve the official API documentations.展开更多
In the field of Computer Science,software developers need to use a wide array of social collaborative platforms for learning and cooperating.The most popular ones are GitHub and Stack Overflow.Existing platforms only ...In the field of Computer Science,software developers need to use a wide array of social collaborative platforms for learning and cooperating.The most popular ones are GitHub and Stack Overflow.Existing platforms only support search queries to extract relevant repository information from GitHub,or questions and answers from Stack Overflow.This ignores the valuable coder-related part-who are the top experts(geek talents)in a specific area?This information is important to companies,open source projects,and to those who want to learn from an expert role model.Thus,how to find the right developers is quite a crucial yet challenging problem.Most of the current works mainly focus on recommending experts in a particular software engineering task and ignore the relationship between developers within different projects.In this paper,we propose a novel technique that automatically identifies geek talents from GitHub,Stack Overflow,and across both communities.The results show that our work performs well at recommending proper developers in diverse areas.展开更多
Security has always been a popular and critical topic. With the rapid development of information technology, it is always attracting people's attention. However, since security has a long history, it covers a wide ra...Security has always been a popular and critical topic. With the rapid development of information technology, it is always attracting people's attention. However, since security has a long history, it covers a wide range of topics which change a lot, from classic cryptography to recently popular mobile security. There is a need to investigate security-related topics and trends, which can be a guide for security researchers, security educators and security practitioners. To address the above-mentioned need, in this paper, we conduct a large-scale study on security-related questions on Stack Overflow. Stack Overflow is a popular on-line question and answer site for software developers to communicate, collaborate, and share information with one another. There are many different topics among the numerous questions posted on Stack Overflow and security-related questions occupy a large proportion and have an important and significant position. We first use two heuristics to extract from the dataset the questions that are related to security based on the tags of the posts. And then we use an advanced topic model, Latent Diriehlet Allocation (LDA) tuned using Genetic Algorithm (GA), to cluster different security-related questions based on their texts. After obtaining the different topics of security-related questions, we use their metadata to make various analyses. We summarize all the topics into five main categories, and investigate the popularity and difficulty of different topics as well. Based on the results of our study, we conclude several implications for researchers, educators and practitioners.展开更多
Stack Overflow is a popular on-line question and answer site for software developers to share their experience and expertise. Among the numerous questions posted in Stack Overflow, two or more of them may express the ...Stack Overflow is a popular on-line question and answer site for software developers to share their experience and expertise. Among the numerous questions posted in Stack Overflow, two or more of them may express the same point and thus are duplicates of one another. Duplicate questions make Stack Overflow site maintenance harder, waste resources that could have been used to answer other questions, and cause developers to unnecessarily wait for answers that are already available. To reduce the problem of duplicate questions, Stack Overflow allows questions to be manually marked as duplicates of others. Since there are thousands of questions submitted to Stack Overflow every day, manually identifying duplicate questions is a difficult work. Thus, there is a need for an automated approach that can help in detecting these duplicate questions. To address the above-mentioned need, in this paper, we propose an automated approach named DuPPREDICTOR that takes a new question as input and detects potential duplicates of this question by considering multiple factors. DuPPREDICTOR extracts the title and description of a question and also tags that are attached to the question. These pieces of information (title, description, and a few tags) are mandatory information that a user needs to input when posting a question. DuPPREDICTOR then computes the latent topics of each question by using a topic model. Next, for each pair of questions, it computes four similarity scores by comparing their titles, descriptions, latent topics, and tags. These four similarity scores are finally combined together to result in a new similarity score that comprehensively considers the multiple factors. To examine the benefit of DuPPREDICTOR, we perform an experiment on a Stack Overflow dataset which contains a total of more than two million questions. The result shows that DuPPREDICTOR can achieve a recali-rate@20 score of 63.8%. We compare our approach with the standard search engine of Stack Overflow, and DuPPREDICTOR improves its recall-rate@10 score by 40.63%. We also compare our approach with approaches that only use title, description, topic, and tag similarity and Runeson et al.'s approach that has been used to detect duplicate bug reports, and DUPPREDICTOR improves their recall-rate@10 scores by 27.2%, 97.4%, 746.0%, 231.1%, and 16.4% respectively.展开更多
Stack Overflow provides a platform for developers to seek suitable solutions by asking questions and receiving answers on various topics.However,many questions are usually not answered quickly enough.Since the questio...Stack Overflow provides a platform for developers to seek suitable solutions by asking questions and receiving answers on various topics.However,many questions are usually not answered quickly enough.Since the questioners are eager to know the specific time interval at which a question can be answered,it becomes an important task for Stack Overflow to feedback the answer time to the question.To address this issue,we propose a model for predicting the answer time of questions,named Predicting Answer Time(i.e.,PAT model),which consists of two parts:a feature acquisition and fusion model,and a deep neural network model.The framework uses a variety of features mined from questions in Stack Overflow,including the question description,question title,question tags,the creation time of the question,and other temporal features.These features are fused and fed into the deep neural network to predict the answer time of the question.As a case study,post data from Stack Overflow are used to assess the model.We use traditional regression algorithms as the baselines,such as Linear Regression,K-Nearest Neighbors Regression,Support Vector Regression,Multilayer Perceptron Regression,and Random Forest Regression.Experimental results show that the PAT model can predict the answer time of questions more accurately than traditional regression algorithms,and shorten the error of the predicted answer time by nearly 10 hours.展开更多
缓冲区溢出漏洞广泛存在于由不安全的高级语言所编写的程序中.利用缓冲区溢出漏洞,攻击者可以实现控制流劫持等危险攻击方式.基于Canary的栈保护技术是处理缓冲区溢出漏洞的一种简单有效且广泛部署的防御手段,然而位置固定和取值相同的...缓冲区溢出漏洞广泛存在于由不安全的高级语言所编写的程序中.利用缓冲区溢出漏洞,攻击者可以实现控制流劫持等危险攻击方式.基于Canary的栈保护技术是处理缓冲区溢出漏洞的一种简单有效且广泛部署的防御手段,然而位置固定和取值相同的特点使其容易被攻击者分析和破解.本文提出一种基于软件多样性的栈保护技术,它以拥有随机化大小和偏移的异构Canary为核心,不仅能直接抵御常规Canary无法处理的泄漏类和覆盖类攻击,而且能构造出各种更加安全的多样性软件系统.实验结果表明,异构Canary在有效提升安全性的同时仅为SPEC CPU 2017基准程序集额外引入了不高于2%的编译开销和平均3.22%的运行开销.展开更多
基金Project(2018-YFB1004202)supported by the National Key R&D Program of ChinaProject(61702534)supported by the National Natural Science Foundation of China
文摘Nowadays,more and more Android developers prefer to seek help from Q&A website like Stack Overflow,despite the rich official documentation.Several researches have studied the limitations of the official application programming interface(API)documentations and proposed approaches to improve them.However,few of them digged into the requirements of the third-party developers to study this.In this work,we gain insight into this question from multidimensional perspectives of API developers and API users by a kind of cross-validation.We propose a hybrid approach,which combines manual inspection on artifacts and online survey on corresponding developers,to explore the different focus between these two types of stakeholders.In our work,we manually inspect 1000 posts and receive 319 questionnaires in total.Through the mutual verification of the inspection and survey process,we found that the users are more concerned with the usage of API,while the official documentation mainly provides functional description.Furthermore,we identified 9 flaws of the official documentation and summarized 12 aspects(from the content to the representation)for promotion to improve the official API documentations.
文摘In the field of Computer Science,software developers need to use a wide array of social collaborative platforms for learning and cooperating.The most popular ones are GitHub and Stack Overflow.Existing platforms only support search queries to extract relevant repository information from GitHub,or questions and answers from Stack Overflow.This ignores the valuable coder-related part-who are the top experts(geek talents)in a specific area?This information is important to companies,open source projects,and to those who want to learn from an expert role model.Thus,how to find the right developers is quite a crucial yet challenging problem.Most of the current works mainly focus on recommending experts in a particular software engineering task and ignore the relationship between developers within different projects.In this paper,we propose a novel technique that automatically identifies geek talents from GitHub,Stack Overflow,and across both communities.The results show that our work performs well at recommending proper developers in diverse areas.
基金This work is supported by the National Natural Science Foundation of China under Grant No. 61572426 and the National Key Technology Research and Development Program of the Ministry of Science and Technology of China under Grant No. 2015BAH17F01.
文摘Security has always been a popular and critical topic. With the rapid development of information technology, it is always attracting people's attention. However, since security has a long history, it covers a wide range of topics which change a lot, from classic cryptography to recently popular mobile security. There is a need to investigate security-related topics and trends, which can be a guide for security researchers, security educators and security practitioners. To address the above-mentioned need, in this paper, we conduct a large-scale study on security-related questions on Stack Overflow. Stack Overflow is a popular on-line question and answer site for software developers to communicate, collaborate, and share information with one another. There are many different topics among the numerous questions posted on Stack Overflow and security-related questions occupy a large proportion and have an important and significant position. We first use two heuristics to extract from the dataset the questions that are related to security based on the tags of the posts. And then we use an advanced topic model, Latent Diriehlet Allocation (LDA) tuned using Genetic Algorithm (GA), to cluster different security-related questions based on their texts. After obtaining the different topics of security-related questions, we use their metadata to make various analyses. We summarize all the topics into five main categories, and investigate the popularity and difficulty of different topics as well. Based on the results of our study, we conclude several implications for researchers, educators and practitioners.
文摘Stack Overflow is a popular on-line question and answer site for software developers to share their experience and expertise. Among the numerous questions posted in Stack Overflow, two or more of them may express the same point and thus are duplicates of one another. Duplicate questions make Stack Overflow site maintenance harder, waste resources that could have been used to answer other questions, and cause developers to unnecessarily wait for answers that are already available. To reduce the problem of duplicate questions, Stack Overflow allows questions to be manually marked as duplicates of others. Since there are thousands of questions submitted to Stack Overflow every day, manually identifying duplicate questions is a difficult work. Thus, there is a need for an automated approach that can help in detecting these duplicate questions. To address the above-mentioned need, in this paper, we propose an automated approach named DuPPREDICTOR that takes a new question as input and detects potential duplicates of this question by considering multiple factors. DuPPREDICTOR extracts the title and description of a question and also tags that are attached to the question. These pieces of information (title, description, and a few tags) are mandatory information that a user needs to input when posting a question. DuPPREDICTOR then computes the latent topics of each question by using a topic model. Next, for each pair of questions, it computes four similarity scores by comparing their titles, descriptions, latent topics, and tags. These four similarity scores are finally combined together to result in a new similarity score that comprehensively considers the multiple factors. To examine the benefit of DuPPREDICTOR, we perform an experiment on a Stack Overflow dataset which contains a total of more than two million questions. The result shows that DuPPREDICTOR can achieve a recali-rate@20 score of 63.8%. We compare our approach with the standard search engine of Stack Overflow, and DuPPREDICTOR improves its recall-rate@10 score by 40.63%. We also compare our approach with approaches that only use title, description, topic, and tag similarity and Runeson et al.'s approach that has been used to detect duplicate bug reports, and DUPPREDICTOR improves their recall-rate@10 scores by 27.2%, 97.4%, 746.0%, 231.1%, and 16.4% respectively.
基金supported by the National Natural Science Foundation of China under Grant Nos.61902050,61602077 and 61672122the China Postdoctoral Science Foundation under Grant No.2020M670736+1 种基金the Fundamental Research Funds for the Central Universities of China under Grant Nos.3132019355 and 2020cxxmss14the High Education Science and Technology Planning Program of Shandong Provincial Education Department of China under Grant Nos.J18KA340 and J18KA385.
文摘Stack Overflow provides a platform for developers to seek suitable solutions by asking questions and receiving answers on various topics.However,many questions are usually not answered quickly enough.Since the questioners are eager to know the specific time interval at which a question can be answered,it becomes an important task for Stack Overflow to feedback the answer time to the question.To address this issue,we propose a model for predicting the answer time of questions,named Predicting Answer Time(i.e.,PAT model),which consists of two parts:a feature acquisition and fusion model,and a deep neural network model.The framework uses a variety of features mined from questions in Stack Overflow,including the question description,question title,question tags,the creation time of the question,and other temporal features.These features are fused and fed into the deep neural network to predict the answer time of the question.As a case study,post data from Stack Overflow are used to assess the model.We use traditional regression algorithms as the baselines,such as Linear Regression,K-Nearest Neighbors Regression,Support Vector Regression,Multilayer Perceptron Regression,and Random Forest Regression.Experimental results show that the PAT model can predict the answer time of questions more accurately than traditional regression algorithms,and shorten the error of the predicted answer time by nearly 10 hours.
文摘缓冲区溢出漏洞广泛存在于由不安全的高级语言所编写的程序中.利用缓冲区溢出漏洞,攻击者可以实现控制流劫持等危险攻击方式.基于Canary的栈保护技术是处理缓冲区溢出漏洞的一种简单有效且广泛部署的防御手段,然而位置固定和取值相同的特点使其容易被攻击者分析和破解.本文提出一种基于软件多样性的栈保护技术,它以拥有随机化大小和偏移的异构Canary为核心,不仅能直接抵御常规Canary无法处理的泄漏类和覆盖类攻击,而且能构造出各种更加安全的多样性软件系统.实验结果表明,异构Canary在有效提升安全性的同时仅为SPEC CPU 2017基准程序集额外引入了不高于2%的编译开销和平均3.22%的运行开销.