Paper - Experimental Assessment of Software Metrics Using Automated Refactoring

  • Metadata:
    • author: Mel Ó Cinnéide, Laurence Tratt, Mark Harman, Steve Counsell, Iman Hemati Moghadam
    • title: Experimental Assessment of Software Metrics Using Automated Refactoring
    • year: 20122012

  • Essay
    • The paper I chose has the title “Experimental assessment of software metrics using automated refactoring.“ and was published by M. O’Cinnéide, L. Tratt, M. Harman, S.
    • Counsell and I. Hemati Moghadam. In the following paragraphs I will sketch out the problem they were trying to solve, what their approach consisted of. I will then discuss what kind of technical contributions they made and answer the question of why that paper in particular appealed to me.
    • The problem and why it matters
    • To guide decision making in the field of software engineering (both automated and manual), a variety of different metrics are used in regard to measuring certain qualities of source code. These can range from very concrete (LoC) to more abstract concepts like cohesion. As we move on to these higher level metrics, it gets progressively harder to accurately argue how these relate to each other and what exactly is being measured. The paper presents a first experimental approach to examining the relationships between such metrics used in automated refactoring.
    • This is an important issue to discuss, because choosing specific set metrics to assess certain qualities of a program is difficult, if the connections and differences between them on a higher level are unknown. There even is a real possibility of metrics conflicting with each other, even though they are supposed to measure a similar or even the same concept. In this case, a more detailed look at the relationships between these measures is required to provide further insight into how to use them.
    • As far as the paper states, no prior satisfactory approaches have presented for this problem and in turn, no answers have been found to questions regarding these relationships between metrics.
    • The presented approach
    • For their approach, they first limited their scope. To demonstrate their methodology they looked at different metrics for measuring the cohesion of multiple java codebases i.e how strongly the functionalities in selected parts of the source code relate. They then used a metric-driven search-based framework (normally used for design improvements) to apply randomly selected refactorings to randomly selected classes. To make the search space more reasonable, they limited the changes to only those that improved at least one of the chosen metrics. With each change being made, they measured each of the cohesion metrics before and after the change. This allowed them to observe how each of the metrics would react to these changes and identify patterns in how they agree or conflict under certain circumstances. They then used these patterns to reveal different qualities of the metrics examined.
    • The contributions made
    • There were quite a few major contributions to the field of software engineering made in this paper.
    • As stated in the paper, they presented an implementation of a new approach to measuring the relationships between different cohesion metrics using genetic algorithms (as described in the previous paragraph). While such algorithms have been used to improve source code in the past, the researchers explicitly focused on the trends in the metrics caused by various refactorings to the code base.
    • They also highlighted the conflicts between the different metrics that were supposed to measure cohesion. The metrics were observed to only agree in less than half (45%) of the applied refactorings. They even outright conflicted each other in 38% of the cases. This fact leads to a new insight coming from the paper: There are not only different kinds of cohesion being measured, but directly conflicting ones as well.
    • From this, the paper concludes that it would be actually impossible to use searchbased refactoring to optimize for all cohesion metrics at once.
    • Other than these points, there were a number of smaller discoveries made in the paper regarding the specific cohesion metrics they were working with.
    • One observation they made was concerning the volatility of different measures for cohesion. LSCC, CC and LCOM5 proved to be highly volatile, changing with nearly every refactoring that was applied. They also made a second point about the volatility of a metric, arguing that it was in big parts dependent on the software system examined. The paper was not clear if this was the first occurrence of these phenomena, but if so, this serve as another contribution the paper made.
    • The paper also looked into the effects of including and excluding inheritance from the LSCC and TCC metrics i.e. specifying if all the inherited methods and fields should be looked into when calculating cohesion or not. Surprisingly, the two metrics show different relationships with and without inheritance, conflicting with inheritance present and largely agreeing otherwise. This makes the inclusion of inheritance another factor to consider in choosing metrics. The paper does not explain the subsequent implications of this, but states that this is in fact a new discovery.
    • To conclude: while no solutions for metric conflicts or meta-metrics have been introduced, the paper provided a framework for further examining the relationships between metrics that could aid developers in choosing the right source code features for their specific use case.
    • My personal interest in the presented topic
    • This leaves one question to be answered: Why did choose this particular paper as a personal entry point to the field of search based software engineering?
    • As am really interested in the fields of data mining, machine learning and data science in general, have dealt with software that analyzes all different kinds of data.
    • But have yet to work on projects that look at the multitude of data points software itself has to offer (therefore turning the solution domain into a problem domain itself).
    • Looking at all the troubles big companies seem to have with organizing projects and maintaining code bases, to me it just makes sense to apply our variety of tools and approaches to these problems. An important aspect of this is knowing what is being measured and what insights can be gained from such analysis, hence why having a great understanding of metrics is so important. By offering a method to better analyze software quality metrics, this paper in my opinion shines light on a really critical part of the “software-as-data” approach.
    • While the importance of the topic played a big part in my enjoyment of the paper, there also have been a handful of smaller things personally liked a lot. am a fan of the scientific discourse in general, so revelations like “We thought that these metrics measure the same thing, but here is data that suggests otherwise” are just overall entertaining to read. On top of that, while got a high level understanding of the concept of cohesion from my SWE class, seeing how different researchers tried to put that into numbers really deepened my understanding of cohesion.
    • Also, enjoyed the fact that the paper talked about a specific technology used to automatically refactor code (Code Imp), as this has been a rather abstract concept for me until now.
    • Even though this has only been an entry point to search based software engineering for me, think got a decent view of a particular challenge in the problem domain of SWE and widened my understanding of what this field is about. And although the genetic algorithm presented in the paper was not extremely sophisticated, it still was used in a real research project to great effect. Because of that, I am really excited to learn about further search-based algorithms in the next few weeks.