Paper: Predicting Defects for Eclipse

Tuesday, April 7, 2009

T. Zimmermann, R. Premraj, and A. Zeller, "Predicting defects for eclipse," in PROMISE '07: Proceedings of the Third International Workshop on Predictor Models in Software Engineering. Washington, DC, USA: IEEE Computer Society, 2007, pp. 9+.

In this study, Zimmermann et al. map defects found in the Eclipse bug database to the source code, for both pre- and post-release defects. They also calculate several complexity metrics for each file and package, and then explore how those metrics correlate with pre-release defects and post-release defect counts, and briefly how they can be used to predict defect proneness. All of their data is published here. Among other things, their results show a strong correlation between pre- and post-release defects (a buggy package/file is still buggy after release); all complexity measures are at least positively correlated with pre- and post-defect rates (a more complex package/file has more defects); it's possible to learn (linear regression-wise) reasonable models to assess defect proneness for later releases by looking only at a single release.

What's interesting to me about this study is the definition of defect used here, and the method of counting defects. Zimmerman et al. define a defect by the bug report and the associated code change that fixes it. In this way, a defect is defined as anything worth fixing. In Hatton's terms, this definition covers both faults and failures, but limits it to only those problems the users and developers find relevant.

Programmatically counting defects is done in two steps: in the first step fixes are identified by searching through the version control change log for entries that contain references to bugs (e.g. "'fixed 42233' or 'bug #23444'"); in the second step the release the fix applies to is determined by looking at the bug report in the bug tracking system. This method could be adapted to any project where the developers consistently mark fixes with a reference to the bug tracking system or release number (including posting comments in the code). The authors reference three other papers which use a similar technique.

No comments:

Post a Comment