Design patterns are object oriented software design practices for solving common design problems. Design patterns provide the reuse of proven designs and architectures rather than the reuse of code. The most well-known design pattern literature in software engineering is the book published by Gang of Four (GoF) in 1995  . They cataloged 23 design patterns with their specific solutions to common design problems, benefits and disadvantages.
Design patterns have many benefits like improving software quality, understandability, flexibility, reusability, extensibility, maintainability, and reducing development time  . Design patterns are widely accepted in software engineering world and their benefits to software quality studied by many researchers. Riehle  and Beck  pointed out the benefits of using design patterns. They improved the documentation of software designs and made implementing designs and comprehending source code easier. Prechelt et al.  and Vokáč et al.  performed experiments related to software maintenance by comparing design pattern to simpler alternative solutions. They found positive effects of employing design patterns, either maintenance time reduced compare to alternative solution or additional flexibility achieved without requiring more maintenance time.
Design patterns have many benefits to software quality, but they also have disadvantages as mentioned by GoF, so they should be applied with care. They bring expert knowledge but the incorrect integration and use of the chosen pattern can overcomplicate the design and make maintenance harder. Bieman et al.  examined several different programs, with and without patterns, and concluded that in contrast with common knowledge, the use of design patterns can lead to more change-prone classes during evolution. The evolution process of a pattern may require changes in some parts of the pattern and may lead to missing parts of the design pattern. The changes of system parts should not break the constraints and properties of design patterns  .
Some empirical studies also found that the use of design patterns may correlate with higher defect rate and more extensive changes. The previous studies by Vokáč  and Aversano et al.  have both shown that some of the design patterns have higher defect rates than non-pattern classes. Another study done by Gatrell et al.  found that pattern based classes have large number of LOC added for the correction of the faults. Aversano et al.  studied the design pattern defects and scattered crosscutting concerns. They found that if the patterns included crosscutting concerns, their defects rates could increase since the implementation of the concern is involved in more than one design patterns.
In this research, we study the impact of design patterns on software defects by mining the repositories of open source software projects. Mining software repositories have recently emerged as a promising means to understand software engineering practices  . Many researchers have proposed mining software repository as an effective way of learning about software development.
In our study, we first select 26 open source Java software projects. We then extracted metrics of these projects from their repositories including source code repositories and bug tracking systems. The metrics include data about design patterns and defects. The metrics are then put into a metric database for analysis using correlation and regression.
Our study extends previous studies by including many more software projects and using more comprehensive metrics. Previous studies   on design patterns and software defects used only a small number of software projects, from one to three. Our metrics include more defect data than previous studies.
The rest of the paper is organized as follows. Section 2 discusses related work. Section 3 introduces research problems and our proposed approach, along with its implementation including the system components and the process. Section 4 presents our study’s results with its design and analysis. Section 5 discusses threats to the validity of our study. Section 6 concludes our study along with discussion on future work.
2. Related Work
In this section, we first mention design pattern related surveys and mapping studies. Second, we discuss related work on design patterns and software quality in general. We then describe previous studies on design patterns and software defects.
2.1. Design Pattern Mapping
The following surveys and mapping studies examine the use of design patterns.
Ampatzoglou et al.  performed a mapping study on GoF design patterns. The aim of their study is to present researchers with research areas in the design pattern field and categorize design pattern related work in to subtopics. The most active research subtopic areas are pattern detection and impact of GoF design patterns on software quality.
Zhang and Budgen  performed a mapping study on design patterns. They concluded that design patterns improve maintainability, but that their effects on software quality are not always positive.
In another study, Zhang and Budgen  performed a survey study to identify valuable GoF design patterns for experienced users. They conclude that only three design patterns Observer, Composite and Abstract Factory were identified by experiences users as valuable.
Bafandeh Mayvan et al.  performed a mapping study on design patterns to aid researchers in active research topics in design patterns. They classified design patterns related publications into six research topic areas. Most publications are in the areas of Pattern Development, Pattern Mining, and Pattern Usage.
2.2. Design Pattern and Software Quality
In these studies, the effect of design patterns on software quality, such as maintainability, reusability, testability, extendibility, and so on, are examined. The software quality is measured by code metrics which include many metrics for object-oriented software, such as the number of classes, the depth of inheritance tree, average number of method in a class, and so on.
Ampatzoglou et al.   studied effects of design patterns on quality and maintainability of the software systems. In one study  , they examined two open source game software, one developed in JAVA other one developed in C++. They looked at two different versions of the games, one without design pattern and one with design pattern. They found that using design patterns lowered the coupling and complexity at the same time increased cohesion and the number of classes. Design patterns also produced easily understandable, testable and maintainable code. In another study  , they looked at Bridge, Abstract Factory, and Visitor patterns. They observe that in three cases the pattern solution provides a more maintainable design, but there are cases, where the pattern is not the optimal solution. The use of a pattern in general produces more extensible design too.
Ampatzoglou et al.  investigated the effect of design patterns on stability. In their study, they included 537 open source software projects and about 65,000 classes. They found that classes participate in one design pattern are often more stable than classes that do not use a design pattern. In addition, classes participating on more than one design pattern are less stable than classes participate in one or classes don’t participate in any design pattern. Some of the design parents (Singleton, Facade, Mediator, Observer, Composite and Decorator) are more resistant to propagation of changes than others.
Elish  studied the impact of four structural design patterns (Adapter, Bridge, Composite and Facade) on stability. Results showed that Adapter, Bridge, Composite, and Facade design patterns all have a positive impact on stability.
Di Penta et al.  investigated the change proneness of classes that are participating in design motifs on three open source software projects (JHotDraw, Eclipse-JDT, and Xerces). They selected 12 design patterns (Abstract Factory, Adapter, Command, Composite, Decorator, Factory Method, Observer, Prototype, Singleton, State/Strategy, Template Method, and Visitor). They studied class role change proneness and kinds of changes happen over the different snapshots and releases of the software products. They found that in all three software products, Abstract Factory classes in Concrete Factory role change more often than the classes in Abstract Factory role and for Factory Method classes in Concrete Creator role are more change prone than Creator roles.
Huston  in his study selected Mediator, Bridge, and Visitor patterns and compared them with their non-pattern forms. He observed that the use of design patterns did not produce lower quality metrics.
Hsueh et al.  developed an object-oriented quality model, to validate if a design pattern is well-applied, for example, if the intended structural model really resolves the quality problems.
Posnett et al.  examined the effect of pattern role on change-proneness. They collected data from 3 open source software projects (JHotDraw, Xerces and Eclipse JDT) and identified the pattern and meta-pattern instances. Most classes playing implementation roles are less change-prone when the size is not taking into consideration, but they are more change-prone after compensated for size.
Feitosa et al.  investigated pattern grime on 5 industrial projects. Their findings suggest that pattern grime depends on the pattern type and developer. They observed that the Factory Method is more grime-prone and the Singleton is least grime-prone in comparison to other patterns. They also point out that developers who perform more changes on the software are less likely to accumulate grime.
Izurieta and Bieman  studied the accumulation of grime on the testability of design patterns. They selected Visitor, State and Singleton design patterns on an open source software project called JRefactory. They found that Singleton and Visitor patterns required more test cases in order to test new grime buildup. In the case of State pattern no significant grime buildup or decay was shown.
In another study, Izurieta and Bieman  investigated design pattern decay, grime and rot during the evolution of 3 open source software projects (JRefactory, ArgoUML, eXist). Their results showed that there is little evidence for design pattern rot, but significant evidence of modular grime. Grime buildup has a negative impact on testability and adaptability of design patterns. Most grime build up occurred when the coupling of the classes increased.
Ampatzoglou et al.  investigated the reusability of design patterns, classes, and software packages. They compared the reusability of identified classes with the reusability of the patterns and the packages that these classes belong to. Based on results from 100 open source projects, they found that pattern-based approach provides statistically more reusable groups of classes. The results also suggested that in most of reusing the design pattern offers the best option.
Ampatzoglou et al.  built a repository to aid software developers and researchers. The repository helps user to easily search design patterns, projects and reusable components on game development. They perform experiments with researcher and developers with different levels of experience. Their results show that developers using the repository perform given programming assignments with fewer defects. They also measure the time required to perform the task, researchers and developers perform the task in shorter time compared to developers using conventional methods. They point out that inexperienced users are more likely to benefit from using the repository.
Aversano et al.  analyzed three open-source systems in their study. The study aimed to learn how frequent the object-oriented design patterns were modified and what kind of changes they underwent. They discovered that the patterns which played more important role in the software systems changed more frequently. They also found that large systems exhibited more changes in pattern implementation and fewer changes in method interfaces.
Some studies used controlled experiments to evaluate the quality of software solutions using design patterns.
Prechelt et al.  performed an experiment related to software maintenance by comparing design pattern to simpler alternative solutions. They found positive effects of employing design patterns in shorter maintenance and additional flexibility compared with alternative solutions. In their replicated study  , the results showed that the simpler versions of programs required shorter time to extend than their design pattern counterparts, especially for Abstract Factory and Composite. However, in Krein et al.’s replication  of the original experiment of Prechelt et al.  , they found that there were some contradictions when they compared the results to the original experiment. They concluded that they couldn’t find any helpful impact of employing design patterns.
Vokáč et al.  also performed an experiment to investigate design patterns use in maintenance. They found that Observer and Decorator required some training, but were easy to understand and shortened the maintenance time. Abstract Factory had minor positive effect on time required for the maintenance task and slightly improve quality. In case of Visitor, experiment shows that the programmers didn’t use the Visitor pattern to perform changes.
Ng et al.   perform controlled studies on design patterns and maintenance. In their study  , they investigated whether maintainers utilized deployed design patterns, and what kind of tasks they performed when they used design patterns. They performed the study on 215 human subjects requiring 6 changes in 3 programs. Their results revealed that design patterns were used by most of the subjects to complete the anticipated changes. In another study  , they investigated the potential effects of design patterns on the productivity of maintainers. They perform the study on 118 human subjects requiring 3 change tasks on a program. They found that previous exposure to the program and the presence of pattern-unaware solutions were strongly correlated with correctly completed maintenance tasks and time. They also concluded that neither prior exposure to design patterns nor prior exposure to the programming language was a significant factor.
Feitosa et al.  investigate the energy consumption on State/Strategy and Template Method on two open source software projects and compared their results to alternative solutions. Their results showed that alternative solutions use less energy in many cases. They found that design patterns provide slightly better energy efficient solution only when they are implementing complex behaviors like larger in method sizes and multiple calls to external classes.
Sahin et al.  performed an empirical study to explore the effect of the 15 GoF design patterns on energy usage. Their result showed that Factory Method, Prototype, Bridge, and Strategy patterns have moderate impact while Decorator pattern has substantial impact on energy usage.
Bunse and Stiemer  studied the impact of energy consumption of seven GoF design patterns (Facade, Abstract Factory, Observer, Decorator, Prototype, and Template Method) on Mobile Java applications. They concluded that Decorator and Prototype have a negative impact on energy consumption while Facade, Observer or Template Method showed no impact difference.
Litke et al.  analyzed the impact of 6 design parents on energy consumption and performance. They concluded that there is no significant evidence that design patterns consume more energy.
2.3. Design Pattern and Software Defects
In the following studies, the use of design patterns and their effects on software defects is investigated.
In an early paper, Vokáč  investigated the defects of the classes participated in selected design pattern on a large commercial software product. Five (Observer, Decorator, Singleton, Factory Method, and Template Method) out of 23 GoF’s  design patterns were included in the study. The finding from the quantitative results showed that Factory Method was correlated with lower defect rate. Template Method mostly used in simple context and slightly lower defect rate. Observer was correlated with higher defect rates. When Singleton and Observer were both present in the same class, they were correlated with higher defect rate. No significant correlation was detected for Decorator. The author concluded that in the case of Observer and Singleton patterns, their uses were often complex that even the correct usage and implementation of these patterns might not be enough to reduce defect rate to average.
Aversano et al.  investigated whether the presence of defects in design patterns’ code was correlated with their induced crosscutting concerns. This study was a follow-up on their earlier study and the same three open-source systems were used  . They concluded that if a pattern included crosscutting concerns, defect rates of its classes could increase.
Gatrell et al.  investigated whether design pattern classes had more faults than non-design pattern classes in a commercial C# software product. They selected 13 design patterns for the study (Adaptor, Builder, Command, Creator, Factory, Method, Filter, Iterator, Proxy, Singleton, State, Strategy and Visitor). They found that Adaptor, Method, and Singleton patterns were more fault-prone than others. They found that pattern related classes had larger number of lines of code added for the correction of the faults.
Elish and Mohammed  performed an empirical study on fault density of classes participate on design motifs. They didn’t find any clear tendency for the impact on fault density between participant and non-participant classes in design motifs. They found that creational and behavioral design motifs are more fault dense than structural design motifs. Especially Factory Method, Adapter, Composite and Decorator show negative association while Builder shows positive association with fault density.
Ampatzoglou et al.  performed a study on the impact of design patterns on software defects. They included 97 open source Java games and 11 GoF design patterns to investigate the correlation between design patterns and software defects. They concluded that there is no correlation between the overall number of design pattern instances and defect frequency. There is also no correlation between the overall numbers of design pattern instances and debugging effectiveness. They reported that Adapter and Template Method are positively correlated to defect frequency while Abstract Factory, Singleton, Composite, Observer, State, Strategy, Prototype and Proxy patterns negatively correlated to defect frequency. They also reported that when the number of Observer and Singleton pattern instances increase number of bug fixing activities decrease.
In summary, there have been many studies on effect of design pattern on software quality. Some of these previous studies investigated the relationship between design patterns and software defects. They focused on defect rate related to design pattern classes. In their studies, software repositories were examined to identify defects related design pattern classes. Like previous studies on design pattern and software defects, we also extract metrics from software repositories. Unlike previous studies, we use a lot more open source software projects except  which they are only limited to open source game projects, and we extract metrics about defects, in addition to data about design patterns. This allows us to do more robust and comprehensive analysis. We also investigate the relationship between DPIs and defect priority by analyzing DPI metrics which is not studies previous studies. Table 1 shows the differences of our paper and previous works that have investigate the relationship between design patterns and defects.
3. Research Problems and Our Approach
The goal of this study is to understand the effect of design patterns on software quality. As software quality is directly related to software defects, we use metrics related to software defects as measurements of software quality. In this study, we investigate the relationship between design patterns and software defects.
The specific research problems we try to tackle are explained in Section 3.1. Section 3.2 describes our research approach to answer these research questions. The software metrics used in our research are summarized in Section 3.3.
3.1. Research Problems
To limit the scope of this study, we examine two categories of defect metrics: the first related to the number of defects and the second related to the priority of defects. As a consequence, we design two groups of investigations. In the first group, we investigate if design pattern instances are related to the number of defects. In the second group, we investigate if design pattern instances are related to the priority of defects.
In the first group of investigations, we examine the relationship between design pattern instance metrics and defect number metrics. We first examine if the
Table 1. Comparison with previous work.
number of design pattern instances in a project is correlated with the number of software defects in the project. In other words, does a software project with more design patterns instances have more defects? We perform various correlation analyses of design pattern metrics and defect number metrics. The design patterns metrics include individual design pattern instances as well as all design pattern instances together. The defect number metrics include the number of defects and defect rates (number of defects divided by line of code, number of defects divided by number of classes).
Next, we use regression to further examine how much design patterns instance metrics affect the defect number and defect rates. We investigate the effect of individual design pattern instances on the defect number and defect rates. We perform regression analysis on design pattern metrics and defect number metrics to see how much variations in defect number and defect rates are explained by the design pattern instance metrics.
In the second group of investigations, we examine design pattern instance metrics and defect priority metrics. We first investigate if the number of design pattern instances in a project is correlated to the priority of software defects in the project. In other words, does a software project with more design patterns have defects of higher priority? We also examine the individual design pattern instances and their correlation with defect priority. Also, we use defect rates where the number of defects is divided by line of code and by number of classes, respectively.
As in the first group, we use regression analysis to further examine how much design pattern instances affect the defect priority. We perform regression analysis on design pattern instance metrics and defect priority metrics to see how individual design pattern instances affect the defect priority. Again, the analysis is repeated using defect rates where the number of defects is divided by line of code and by number of classes, respectively.
3.2. Our Approach
To answer the research questions listed in the previous section, we collect data from two kinds of software repositories: bug tracking system and source code repository. The metrics are then calculated from the data and are analyzed to understand the relationships between design patterns and defects.
In this section, the system we build to extract and analyze metrics data is explained. We first introduce the system architecture and the process. We then explain each component of the system and the tools that are used in the components.
Figure 1 shows the overall system architecture. The system consists of four components: design pattern detector, bug report examiner, metric calculator, and data analyzer. The design pattern detector extracts design patterns data from the source code repository. The bug report examiner extracts defect data from the bug tracking system. The metric calculator computes design pattern metrics
Figure 1. System architecture and process.
and defect metrics from the data extracted by the design pattern detector and the bug report examiner and store them into a database. The metrics database is used by the metrics analyzer for various analyses.
Our study is carried out in four steps. In the first step, we select open source software projects and collected source codes and bug reports from the software repositories of these projects. In the second step, design pattern data and defect data are extracted from the codes and the reports by the design pattern detector and the bug report examiner, respectively. In step three, metrics are calculated from the data collected in the previous step and are compiled into a database, to be analyzed by the data analyzer in step four.
The components of the system in Figure 1 are described below.
・ Design Pattern Detector
The design pattern detector identifies design pattern instances in source codes. In our approach, we use the design pattern detector developed by Tsantalis  . It uses a graph matching based approach to detect design patterns in Java bytecode. The tool is able to detect the following 12 design patterns: Factory Method, Singleton, Prototype, Adapter, Composite, Decorator, Proxy, Observer, State, Strategy, Template Method, and Visitor.
・ Bug Report Examiner
A bug report contains information related to a software defect, such as severity, priority, type, status, comment, and so on. The bug report examiner scans the bug reports in the bug track systems and collects defect data from these reports.
The bug report examiner is implemented using a tool called Bicho  . The tool retrieves the bug/issue related data from bug tracking system. It is able to retrieve data from various bug tracking systems, including SourceForge, Bugzilla, Launchpad, and JIRA.
・ Metrics Calculator
The metrics calculator computes various metrics related to design patterns and defects from the data extracted by the design pattern detector and the bug report examiner. The metrics are described in Section 3.3.
・ Data Analyzer
The data analyzer examines the metrics extracted by the other three components. We use correlation analysis and regression analysis in examining the metric data. The data analyzer is implemented using IBM’s SPSS software package.
The metrics we calculate from the software repositories include design pattern instance metrics and defect metrics.
3.3.1. Design Pattern Instance Metrics
Design pattern instance metrics are software metrics related to design pattern instances and software size. There are two groups of metrics, the first related to the total number of design pattern instances in a software project, and the second related to the number of individual design pattern instances in a project.
Table 2 summarizes the metrics related to the total number of instances of design patterns.
Table 3 summarizes the metrics related to the individual design pattern instances.
There are two versions of implementation for the Proxy design pattern. They are denoted as Prx and Prx2. Instances of Adapter and Command design patterns have the same structure so they are hard to distinguish. Their instances are combined as AC. Similarly, the State and Strategy design patterns are similar in implementation and their instances are combined into StSt.
The metrics in Table 3 represents the number of instance of individual design patterns. They can be divided by LOC and NOC to derive other metrics.
Table 2. Metrics for Total Number of Design Pattern Instances
Table 3. Metrics for Instances of Individual Design Patterns.
For example, FM/LOC and FM/NOC represents the number of Factory Method instances divided by line of code and by number of classes, respectively.
3.3.2. Defect Metrics
Bug tracking systems usually include information about defects, such as priority, type, severity, status, resolution, comments, description, submitter, and assigned to someone or not, etc. In our study, we use the number of defects and defect priority. Table 4 lists the defects metrics that are related to the number of defects and their priority in a software project, which are described below.
・ Number of Defects, Defect Rate by LOC, and Defect Rate by NOC: the total number of defects detected in a software project (Nbugs), Nbugs divided by the lines of code, and Nbug divided by the number of classes.
・ Defect priority: a categorization of software defect to signify the degree of urgency to be fixed. A lower number indicates a higher priority to fix a defect.
4. Results and Analyses
We selected 26 open sources software projects for our study. The projects are all written in Java. Moreover, to ensure we have enough data for a project, all selected project must have been around for more than 3 years.
They should be active which means they have been updated in the last 3 months. Bug tracking system of the projects should be available to public and they should provide bug related information like priority, description, and comments. Also the tool (Bicho)  we used in our study should be able to collect the information from bug tracking system without any error.
The projects are summarized in Table 5 below, with their name, line of code (LOC), number of classes (NOC), and starting date.
4.1. Design Pattern Instances and Number of Defects
In this section, we investigate the relationship between the number of design pattern instances (DPIs) and the number of defects in a project. We first look at total DPIs and the number of defects, followed by individual DPIs and the number of defects.
We first compute the correlation between the total number of DPIs and the total number of defects. The Pearson correlation shows that at project level, the total number of DPIs in a project is not correlated to the number of defects in
Table 4. Defect metrics.
Table 5. Software projects used in our study.
that project, with a correlation coefficient of 0.103 and a p-value of 0.618.
Since the number of DPIs and the number of defects may be related to project size, we then normalize the number of DPIs and the number of defects by line of code (LOC) and number of classes (NOC). The number of DPIs and the number of defects are divided by LOC, respectively, and their correlation is calculated. The analysis is repeated using NOC as the divisor. The results are similar to these without normalization. Using LOC as the divisor, the correlation coefficient is −0.138 and the p-value is 0.501. Using NOC as the divisor, the correlation coefficient is −0.115 and the p-value is 0.574.
Table 6 summarizes the correlation analysis between total number of DPIs and the number of defects.
It is obvious that there is no correlation between the total number of DPIs and the number of defects. Normalized number of DPIs and normalized number of defects does not correlation either. The correlation analysis results show that, as a total, there are no correlation between the number of DPIs and the number of defects.
To further investigate the correlation between the number of DPIs and the number of defects, we look at the number of instances of individual design patterns. We perform correlation analysis between the number of defects and the number of instances of individual design patterns, as listed in Table 3. The results are shown in Table 7. Other than the Proxy pattern, we do not find any significant correlation between the number of defects and the number of instances of individual design patterns.
We repeat the correlation analysis with normalized number of defects and normalized number of instances of individual design patterns, i.e., they are both divided by LOC and NOC, respectively.
As shown in Table 8, when the number of defects and the number of instances of individual design patterns are normalized by LOC, we do not find any design pattern whose normalized number of instances is significantly correlated with the normalized number of defects.
Table 6. Correlation between total number of dpis and number of defects.
Table 7. Correlation between number of instances of individual design patterns and number of defects.
Table 8. Correlation between number of instances of individual design patterns normalized with LOC and number of defects normalized with LOC.
Table 9. Correlation between number of instances of individual design patterns normalized with NOC and number of defects normalized with NOC.
Table 9 shows the correlation results when the number of defects and the number of instances of individual design patterns are normalized by NOC.
The results are similar to original numbers presented in Table 7. Proxy is the only design pattern whose normalized number of instances is significantly correlated with the normalized number of defects.
However, even though there is little correlation between the number of instances of individual design patterns and the number of defects, it is possible that a combination of them may affect significantly the number of defects. We perform linear regression analysis using the number of defect as dependent variable and the number of instances of individual design pattern as independent variables. The results show a strong relationship with R2 value of 0.846 and p-value of 0.002.
Table 10 summarizes the standardized coefficients and p-value of the linear regression.
It shows that six design patterns have a p-value below 0.05, AC (Adapter/Command), Ob (Observer), StSt (State/Strategy), TM (Template Method), Proxy, and Proxy 2. The number of instances of these six design patterns significantly influences the number of defects. Especially, Adapter/Command and Observer instances have a negative impact the number of defects. This means that as the number of Adapter/Command and Observer instances increases, the number of defects decreases.
One possible explanation is that the use of Adapter/Command and Observer design patterns improves design and code such that there are fewer defects. Our finding on Observer pattern similar to  since Observer pattern has complex use more experienced users should implement. Studies in  and  contradict our findings on Observer pattern. Comparison to Gatrell et al.  our result show that Adapter/ Command design pattern has negative impact on number of defects while their study points out Adapter pattern more fault-prone. Our findings on Template Method similar with Vokáč  result it tends to lower defect rate.
Table 10. Linear regression of number of defects with number of instances of individual design patterns.
4.2. Design Pattern Instances and Defect Priority
In this section, we investigate the relationship between DPIs and defect priority by analyzing DPI metrics and defect priority metric. Of the 26 projects listed in Table 5, project JBPM does not have priority data in its bug tracking repository so it is excluded for this part of our study. We use data of the other 25 projects in Table 5 for analysis in this section.
The projects use two different scales for their priority value. Some use a 1 to 5 scale for their priority value. Others use a 1 to 9 scale. We first use the min-max normalization to linearly convert those projects using the 1 to 9 scale in to a scale of 1 to 5. For example, a priority of 4 in a 1 to 9 scale is converted into (4 − 1)/(9 − 1) * (5 − 1) + 1 = 2.5. The average priority of defects in every project is then calculated.
We use average priority, denoted as AP, in the reminder of this section.
First, we perform correlation analysis between the number of total DPIs and AP. The Pearson correlation between the number of total DPIs and AP is 0.645 with a p-value of 0.000. It is obvious that there is at least moderate positive correlation between the number of total DPIs and AP. In other words, as the number of total DPIs in a project increases, its defects tend to have larger priority values, which mean low priority of the defects.
We also calculated correlation between AP and normalized number of total DPIs, using LOC and NOC, respectively. The results are similar to these without normalization. The results are summarized in Table 11.
Next, we perform correlation analysis between the number of instances of individual design patterns and AP. Table 12 summarizes the correlation analysis results.
From Table 12, we observe that four design patterns, AC (Adapter/Command), Prototype, State/Strategy, and Template Method, have a Pearson correlation coefficient above 0.5 and p-value below 0.05. These four design patterns show moderate positive correlation with the average priority. As the number of instances of these four design patterns increases, the average priority also increases, which means the defects have a lower priority on the average. The analysis is repeated for normalized number of instances of individual design patterns using both LOC and NOC. The results are similar so they are not presented.
We then perform linear regression with average priority as dependent variable and the number of instances of individual design pattern as independent variables. Table 13 shows the results of the linear regression. It has an R2 value of 0.922 and a significance value of 0.000. It is reasonable to conclude that the
Table 11. Correlation between number of total DPIs and average priority.
Table 12. Correlation between number of instances of individual design patterns and average priority.
Table 13. Linear regression of average priority using number of instances of individual design patterns.
number of instances of individual design patterns affect average priority.
Form Table 13, it is apparent only instances of three design patterns, AC (Adapter/Command), Proxy, and Proxy 2 have a p-value below 0.05. Other design pattern instances do not seem to influence average priority. Adapter/Command has positive impact on average priority. The more Adapter/Command instances are in a project, the larger its average defect priority value is. Proxy and Proxy 2 both have negative impact on average priority. The more Proxy and Proxy 2 instances in a project, the lower its average priority value is, i.e., the higher priority of defects.
We also perform linear regression using number of design pattern instances normalized by LOC and NOC, respectively. Table 14 summarizes the linear regression results using the number of instances of individual design patterns divided by LOC. The regression has an R2 value of 0.929 and a p-value of 0.000.
Form Table 14, we found instances of six design patterns, Prototype, Singleton, Adapter/Command, Composite, State/Strategy, and Proxy 2 have a p-value below 0.05. Of these six design patterns, instances of Prototype, Adapter/Command, and State/Strategy have positive impact on average priority, while instances of Singleton, Composite, and Proxy 2 have negative impact on average priority.
Table 15 summarizes the linear regression results using the number of instances of individual design patterns divided by NOC.
The regression has an R2 value of 0.906 and a p-value of 0.000. It is obvious from Table 15 that instances of five design patterns, Prototype, Adapter/Command, Composite, Proxy, and Proxy 2 have a p-value below 0.05. Instances of Prototype and Adapter/Command have positive impact on average priority, i.e., more instances per class correlated with higher priority values. Instances of Composite, Proxy, and Proxy 2 have negative impact on average priority.
For the three cases of linear regression analysis on instances of individual design patterns, using number of instances, number of instances divided by LOC, and number of instances divided by NOC, respectively, we see some similarities and some differences. All three shows that instances of Adapter/Command have positive effect on average priority and instances of Proxy 2 have negative effect
Table 14. Linear regression using number of instances of individual design patterns normalized with LOC.
Table 15. Linear regression using number of instances of individual design pattern normalized with NOC.
on average priority. Also, instances of Prototype show positive impact on average priority when using normalized number of instances. Similarly, instances of Composite show negative impact on average priority when using normalized number of instances.
Based on above investigations, we conclude that the number of instances of several design patterns has significant effect on the number of defects and their priority.
5. Threats to Validity
There are several threats to the validity of our study. We discuss the serious threats in the following.
1) Not all design patterns are detected.
The design pattern detector used in our study  finds only 12 design patterns. Though it has been shown to be effective in that it recognizes all instances of these 12 design patterns with a low false positive rate, it does not detect other design patterns. There are many more other design patterns. For example, the Gang of Four (GoF) book  cataloged 23 design patterns, and many more design patterns have been cataloged after the book’s publication. It is almost certain that there are other design pattern instances in these projects that we studied. If these 12 design patterns are typical of all design patterns, i.e., they are good representatives of all design patterns, our results would apply to all design patterns. Otherwise, our results should be interpreted only in terms of these 12 design patterns.
One way to solve the problem is to improve the design pattern detector so that it can find more design patterns. We are actively looking for a more powerful design pattern detector.
2) Defects are accumulated over time.
The defect data are extracted from the projects’ bug tracking repositories. A project’s bug tracking repository contains all defects reported since the initiation of the project. Since the projects have different initiation date as presented in Table 5, older projects would normally have more defects than newer projects. This is somewhat alleviated that all projects are at least 3 years old. As observed by Kan  , at least half of a software system’s field defects are revealed in their first year of operation. It is reasonable to expect majority of a project’s defects are detected in the first 3 years.
3) Priority values may not be consistent among the projects.
As we described in Section 4.2, the projects in our study uses two different scales for defect priority. Some use a scale of 1 to 5 and others use a scale of 1 to 9. We use min-max normalization to transform 1 to 9 scale to 1 to 5 scale. This would introduce some inaccuracy since priority values are not necessarily assigned linearly, e.g., a priority value of 4 does not necessarily mean twice more urgent compared to a priority value of 8. Even for the projects using the 1 to 5 scale, they may not assign priority values consistently, e.g., a priority value 5 in a project may be assigned a priority value 4 in another project. A universal guideline for assigning priority values would help to eliminate this kind of discrepancy.
There are some other threats such as incomplete data in repositories, different application domains of projects, and different skill levels and capabilities of developers. We think these threats are minor and we do not discuss them in detail.
6. Conclusions and Future Work
In this study, we investigate the relationship of design patterns and software defects in a number of open source software projects. In particular, we analyze relationships between design pattern metrics and software defect metrics using correlation and regression analysis. In our first group of investigations, we find there is little correlation between the total number of design pattern instances and the number of defects. The number of instances of individual design pattern also does not correlate with the number of defects, except for the Proxy pattern. However, the number of instances of individual design patterns as a group has strong influences on the number of defects. Especially, the number of instances of Adapter/Command, Observer, State/Strategy, Template Method, and Proxy patterns has significant impact on the number of defects.
In our second group of investigations, we found moderate positive correlation between the number of design pattern instances and average defect priority. Moreover, the number of instances of Prototype, Adapter/Command, State/Strategy, and Template Method, is positively correlated with average priority. When considered as a group, the number of instances of Adapter/Command and Proxy 2 has been found to have significant effect on average priority. Prototype and Composite instances, when divided by LOC or NOC, also show significant effect on average priority.
Design patterns have been widely used in software development. Our research extends previous studies on design pattern and software defects by using software defect metrics from bug tracking repositories, in particular, defect priority. There are many ways to extend our research in the future. We discuss some possible future work below.
In the future studies, we can focus finer level of granularity, i.e., role level of design pattern instances and classes.
It is obvious that our study can be extended to use other defect metrics, such as defect fixing time. It will be interesting to investigate how the number of design pattern instances affects other defect metrics.
The design pattern detector used in our study can only find 12 design patterns. One worthwhile research direction is to develop more powerful pattern detectors that can find more design patterns accurately and efficiently.
Since defects are related to complexity, another interesting research topic is to investigate the impacts of design pattern instances on software complexity. We are investigating design pattern instances and their impacts on software complexity metrics.