In the last decade researchers have developed tools to automatically detect bad smells, i.e., symptoms of poor design and implementation choices. While such smells have been object of several empirical studies, there is still little knowledge on when and why bad smells are introduced. To fill this gap, we conducted a large empirical study over the change history of 200 open source projects and investigating when bad smells are introduced by developers, and the circumstances and reasons behind their appearances. Our study required the development of a strategy to identify smell-introducing commits, the mining of over 0.5M commits, and the manual analysis of 9,164 of them (i.e., those identified as smell-introducing). Our findings mostly contradict common wisdom about smell being introduced during evolutionary tasks, and trigger the development of a new generation of recommenders aimed at properly planning smell refactoring activities.
RQ1: When are code smells introduced?
- Raw data of number of commits needed by a smell for its introduction: rawDataTime.zip
- Raw data of metrics trend: rawDataMetrics.zip
- R2 achieved by the different functions tested for regression analysis (and leading to the selection of the liner regression model): regressionR2.zip
RQ2: Why are code smells introduced?
- Raw data of assigned tags: assignedTags.zip
Replicating the study by just considering the 2,555 smell-introducing commits manually validated
- Achieved results: validated.pdf