Code Analysis Rules and the Click Through Culture

Code analysis rules to me are a bit contentious and I often have friendly disagreements with my colleagues responsible for developing custom in-house code analysis rules. I am not an expert in code analysis and have never written one but as a technical lead I have seen them in action daily and have a few opinions about them.

One of the most critical things about code analysis rules is that they need to be well implemented and need to attack real problems that are practical to detect reliably. The problem I see with many code analysis rules is that they have been developed in-house and have not been fully tested and often have large false positive rates. The problem with this is that it teaches programmers to ignore them. Just like your mum clicks through security warnings, the same applies to code analysis rules and it ends up making these rules less relevant and misunderstood.

For a code analysis rule to bring value it must be understood and must have a very low false positive rate. When people see a rule fail a build then they should understand why and understand how to fix it. The moment that developers regularly see that rules fire when they shouldn’t and they are forced to suppress them then the click through culture has been has been created.

To avoid this pitfall it should be identified whether it is even possible to reliably detect a code quality problem. Most normal teams do not have the time allotted that research and development teams have and they cannot spend large amounts of time to develop these rules, instead they must make a best effort. But if this best effort cannot realistically detect the occurrences of the problem that it is trying to prevent then code analysis rules are not best suited.

Also testing of code analysis rules is critical. The best way to do that is to already have a large volume of examples from real world code in the code which can be used as a test set. If you have a good log analytics system then often this can be leveraged in order to find these problems, for example in order to find problems with the dispose pattern you could look for examples of object already disposed exceptions in the logging system and locate the code responsible.

An example of a rule that I figure would be difficult to implement as a code analysis rule is avoiding database queries inside large loops. This is a common pattern that I have seen in poorly performing batch jobs, where inside a loop of say 100,000 iterations there are 5, 10 or 20 database queries are performed. So for the six hours of its operation, the job hammers a commodity database server with 2 to 3 million database queries. So when I discover this kind of thing I find alternative ways of avoiding these are nested queries, for example caching data in lists or maps and accessing the data from there. But to detect these can be difficult especially if you have nested assemblies/modules which may be making database queries. Then there is the problem that perhaps a loop only contains about five iterations, in which case performing multiple database queries is not a problem and there is no need to re-factor the code to use other strategies. Identifying with code analysis the number of iterations in a loop is usually going to be impossible and so this rule may end up either catching a very small percent of these problems or misidentifying a large amount and generally increasing the click through rate.

Developers need to understand why the rule fired and what the solution is. If a developer does not understand why a code analysis rule has fired and why they are being asked to change the code then they will learn nothing from this experience except that code analysis rules suck. It is critical to the effectiveness of these rules that they are understood and have clear instructions and examples of how to fix their code. In Visual Studio, code analysis rules include links to the rule page, so make sure that this page is well documented!

Having said all that I am not implicitly against code analysis rules, far from it, they play a part in protecting code quality but they need to be very carefully implemented and measured in order to guarantee their effectiveness and to guarantee the culture surrounding them is positive. Measurement is vital and the measure is the number of suppressions per rule and by person. Large numbers of suppressions on a rule could indicate that either the rule is poorly implemented and has a high number of false positives or the analysis rule is not understood by developers and they either get frustrated because they don't know how to fix it or they identify it as a false positive and therefore suppress. Add time pressure to the mix and no wonder they clicked suppress.

Large numbers of suppressions by a particular developer should just indicate a lazy developer. That can be easily dealt with by adding alerts when developers suppress rules. When it is seen that a developer is suppressing rules more than others you can sit with them and find out why. In my organisation we even had to threaten a developer with removing their source control permissions if they continued to flagrantly ignore the policy on rule suppression.

So if it is possible to create an effective rule that reliably catches dangerous or problematic code and clearly explains the reasons and solutions then I am 100% in favour. It just requires discipline, realism and continuous measurement to get there.