Sweary source code comments a sign of competence

Open source code with profanity in its comments is a sign of superior programming (PDF), according to a student at the Institute of Theoretical Computer Sciences (ITI) of the Karlsruhe Institute of Technology, Germany. Jan Strehmel, presenting his research in his thesis, analyzed more than 10,000 GitHub projects written in C for the presence of profanity in English, then ran a suite of code quality tests on them. Swearing correlated with higher scores. [via JWZ, desdelinux]


3.2 Conclusion and Future Work

By using the Git-API we successfully collected more than 3,800 repositories containing swearwords. Those swear-repos were then evaluated with the SoftWipe tool to calculate a score, which represents the code quality. Those swear-repos were then compared to over 7600 repositories, which we selected to be our general population and also analysed with the SoftWipe tool. This comparison was done by running multiple hypothesis tests, such as the Kolmogorov-Smirnov test. These tests, combined with our visual analysis of the data yielded the result that repositories containing swearwords exhibit a statistically
significant higher average code-quality (5.87) compared to our general population (5.41). There are many things that you can consider to further add to this study. Of course "[…] there's no data like more data"-Kai-Fu Lee, thus obtaining more swearword samples is of course of interest to us, as well as the possibility to include C++ code as well. During the data crawling, one could of course also deploy natural language processing to more accurately identify swearwords.

Strehmel assumes a correlative relationship, not a causative one. The cause, both of swearing and measurable quality, he proposes: stress.