Wednesday, November 21, 2012
More projects == better defect predictions?
More efficient defect prediction when using data from multiple projects?
By looking at the newest research in the defect prediction field I've discovered this piece of work which intrigued me a bit. Usually we build statistical models in forms of equations describing defect inflows or use analogy based estimates - we use historical data to create models for new projects. This usually works fine, but this paper discusses things one step further, namely (and I quote):
RQ2: How much within project data should be enriched with data from other projects to achieve comparable performance with full within project data predictions?
The results show that using only 10% of the data can yield results of the same quality, which can significantly improve the cost-efficiency of defect predictions in industrial contexts.