In recent years, several photovoltaic (PV) systems have been installed all over the world. It is believed that the PV system is safe. However, several faults have been reported  . So far, various monitoring techniques for PV system have been developed. They are generally classified into onsite monitoring and remote monitoring. In the case of a small grid-connected PV (GCPV) system such as the rooftop PV system, it is not efficient to carry out onsite monitoring of the systems one by one. Therefore, remote monitoring is more suitable than onsite monitoring for the initial diagnosis of small GCPV systems.
Remote monitoring methods for small GCPV systems have been developed  . A friendly tool to detect and locate various types of faults correctly has been developed  and the use of OPC technology has been proposed  . In addition, a method using distributed MPPT, which allows monitoring at the module level, has been developed  . Recently, methods using machine learning have also been developed  . For example, artificial neural network (ANN) is used for both fault classification  and modeling by regression  . ANN and fuzzy logic are compared in several cases  , and the two parameters are modelled in order to identify the number of partially shaded cells  . Wireless sensor network has also been proposed based on support vector machine classifier from historical irradiance  . In this method, both outlier detection and solar power prediction are possible. Before detailed analyses such as the classification of faults, the use of a kind of data mining method, the principal component analysis (PCA), has been proposed  . This work can contribute to efficient data analysis and visualization.
Most of these techniques need the irradiance data measured by a pyranometer or a reference module. However, a rooftop PV system rarely has a pyranometer, and it is not realistic to install new ones in many situations. Therefore, the use of satellite-based irradiance is considered in this study. Although using satellite-based data allows the acquisition of solar irradiance over large areas, these data have uncertainty due to the limitation of time resolution and space resolution. Therefore, it is necessary to verify the uncertainty of the satellite data and conduct appropriate preprocessing before using them. This study proposes a method to detect a decrease in the output power of PV systems, taking into consideration the satellite data error. High accuracy rate and early detection have not been achieved simultaneously in previous methods using satellite data to detect power decrease in PV systems (see 2.2). Therefore, we have focused on an accuracy method to detect a power decrease with shorter elapsed time. In addition, a data extraction method such as cluster analysis is considered in order to classify the decrease as temporary or permanent.
This paper is organized as follows: Section 2 presents the uncertainty of satellite data, some previous methods to detect faults using satellite-based data, and the results of analyzing the trend of satellite data used in this study. Section 3 describes our proposed method to detect faults. Section 4 describes the results. Finally, Section 5 provides the conclusion.
2. Satellite Data
2.1. Error in Satellite Data
Errors in satellite-based data have been observed in some previous studies    . The difference between measured irradiance and that derived from Himawari-8, a Japanese stationary weather satellite, has been evaluated  . Although the accuracy of estimation based on Himawari-8 observation has improved compared to the estimation based on Himawari-7, problems such as the influence of albedo and radiation enhancement have remained. In addition, the results based on hourly and daily averages show a better correlation than the results based on instantaneous data. Other studies  have found a general correlation between the overestimation of satellite-based correlates and the mean number of cloudy days. The uncertainty and bias of models that estimate PV performance without onsite measured irradiance of plane of array have also been investigated  . This study has shown that uncertainty and bias tend to increase by using alternative data instead of measured ones. Using satellite data has particularly contributed to significant uncertainty and bias. These studies have shown that satellite data have errors in measured irradiance, and it is necessary to consider them in our method.
2.2. Previous Study Using Satellite Data
Some studies have developed methods to consider the error of satellite-based estimation  . In order to detect failures in GCPV, the expected energy yield calculated based on the hourly satellite-derived irradiance and the actual one has been compared  . To conduct a more precise diagnosis, the GISTEL model and fuzzy logic approach have been combined  . In addition, a technique to set an accumulation period and to extract the maximum point of output power and satellite-derived irradiance, respectively, has been reported  . This method shows a high accuracy rate because the satellite data error was mitigated by using accumulated data. However, it was difficult to detect faults early because of an accumulation period of 30 days or so. Therefore, a method to detect a decrease in power with only one day’s data is proposed in our paper (see Section 3).
2.3. Trend of Satellite Data Used in This Study
Satellite data from Himawari-8 for every 5 min was used in this research. To consider the fault detecting method, errors in the satellite-based data were analyzed initially. The analysis was conducted by comparing satellite-based and measured irradiance. Figure 1 shows the distribution of the percentage error, which is calculated by Equation (1), against the value of the measured irradiance.
where IS is the satellite-based irradiance, and IM is the measured irradiance.
This Figure shows that the percentage error tends to be lower as the measured irradiance increases due to the weather condition. The distribution of the percentage error seems to be asymmetry. It is supposed to be caused by cloud variability that cannot be observed from satellite. Previous study reported that the error of satellite estimation was coupled with higher cloud fraction  . Thus, it
Figure 1. Comparison between measured irradiance percentage error of satellite-based irradiance.
is inappropriate to use satellite-based irradiance without data processing because using all data, including those with significant errors, can lead to false positive or false negative.
3.1. Categorization of the Day
The previous method to set an accumulation period has not been able to detect a decrease in power early  . To achieve both high accuracy rate and early detection, a method to determine the day for which acceptable data is available to detect a decrease in output power is proposed in this paper. This method was developed considering the following features of satellite-based data. These features were described by Alessandro  , and determined by the simple analysis conducted in Section 2.
・ The error of satellite-based irradiance depends on climatic conditions. Better accuracy is obtained on sunny days.
・ Although individual instantaneous data contain overestimation and underestimation, the error reduces by using long-period data.
Therefore, it is necessary to determine whether the data for a given day should be adopted or rejected. This involves two steps. First, a lower irradiance threshold ITHR is set in order to determine whether the data should be regarded as effective. For example, a threshold of 200 W/m2 indicates that data over 200 W/m2 is effective, while other data are excluded. This step is carried out every 5 min. Second, a lower threshold of NTHR for the number of data is set in order to determine whether the day should be regarded as a “Calculation day”. For example, a threshold of 30 times indicates that the day containing over 30 effective data is regarded as a Calculation day, while other days are excluded. This step is carried out every day after the first step is conducted for an entire day. Eventually, the days are categorized into Calculation day and exclusion days. These steps are shown in Figure 2.
In Figure 2, X is the irradiance value from the satellite, Y is the number of effective data, ITHR and NTHR are the thresholds of irradiance value and the number of effective data, respectively. These values will be optimized in 5.1.
Figure 2 shows that the days are categorized as Calculation day and exclusion days depending on the weather condition, and only one day’s data is used to detect the power decrease if the day is categorized as a Calculation day.
3.2. Power Decrease Detection
The evaluation index for fault detection is defined by the ratio of measured power to expected power, as shown in Equation (2). The expected power is
Figure 2. Data flow to determine “Calculation day” and “Exclusion day”.
calculated by using satellite data and system configuration, including the rated power and the measured power obtained from each home.
where PM is the measured power, and PE is the expected power. PE is calculated using satellite data and system configuration, based on the Japanese Standards Association (JIS) as shown in Equation (3).
where K is the correction factor including temperature factor and system power conditioner factor, PR is the rated power, and GS is the standard irradiance. Gtilt is the tilted irradiance, and it is calculated through the Perez transposition model  . The evaluation index shown in Equation (2) is calculated based on the least squares method, i.e., the gradient with respect to the expected power and the measured power is calculated from the dataset of the Calculation day. Equation (2) indicates that if a system has no fault, the evaluation index is 1.0. If there is a 20% decrease in the system, the evaluation index is 0.8. Fault detection is carried out by comparing the evaluation index and the threshold. Although the threshold can be set arbitrarily, 0.9 is used to detect a 10% decrease in this study.
3.3. Classification of the Fault
Power decrease is broadly classified into permanent decrease caused by failure and temporary decrease caused by partial shade. The three cases of the evaluation index every 5 min, i.e. normal, temporary decrease, and permanent decrease, are shown in Figure 3.
In order to distinguish these decreases automatically, extracting the upper data of the evaluation index and recalculating the evaluation index were considered. This process was conducted after detecting a decrease in power. Three methods of extracting the upper data were compared in this study. The simplest
Figure 3. Evaluation index in the three cases.
method was to use the maximum point as the upper data. Extraction of the data group, instead of the data point, was possible with cluster analysis. Two types of cluster analysis methods were considered. The first method is K-means algorithm, which is one of the most widely used clustering methods. The second method is the fuzzy c-means (FCM) algorithm  . FCM differs from K-means in that FCM uses the membership degree. These methods, i.e., maximum point extraction, data group extraction by K-means, and extraction by FCM, are compared in 4.3. The flow chart of power decrease detection and classification of faults is shown in Figure 4.
If the day is categorized as a Calculation day (see Figure 2), fault detection is conducted. If the system appears to decrease in power due to the comparison between evaluation index and threshold, fault classification is conducted by extracting the upper data and recalculating the evaluation index.
3.4. Target System and Data
Two types of data, i.e. satellite data and output power data, were used in this study. The data period is one year, from 2016 to 2017. Satellite data, including
Figure 4. Data flow of fault detection and classification.
irradiance and temperature, were provided by Himawari-8. The spatial resolution was 1 km. The data were obtained every 5 Output power data was obtained from each household. They were obtained for every minute, and the mean value was used in order to ensure uniformity with the time resolution of satellite data. Overall, approximately 340 systems data were available. However, some of them included power decrease. Therefore, 45 systems that appeared normal were used for parameter optimization. One of 45 systems was used as a representative to evaluate the effectiveness of the proposed method and the parameters. One system, which has no fault, but is affected by the partial shade, was used to consider the three methods to extract the upper data, stated in 3.3.
4. Results and Discussion
4.1. Parameter Consideration
As described in 3.1, two parameters, irradiance value and number of effective data, were set in order to categorize the day as Calculation day or exclusive day. These parameters were optimized by testing various values. The output parameters used for evaluation were “standard deviation (SD) of evaluation index in one year” and “number of Calculation day in one year”. The input and output parameters are shown in Table 1. Overall, 49 patterns were tested, and the evaluation index is calculated for one year in each pattern.
A smaller SD results in a more stable evaluation index, thus contributing to higher accuracy. On the contrary, more Calculation day result in an earlier detection. As stated in 3.3, 45 normal systems were used to consider the optimum parameters. Therefore, the evaluation index for one year in 49 patterns, multiplied by 45 systems, was calculated.
Figure 5 shows the average results of 45 systems in 49 patterns. Thus, one plot shows the average of 45 systems, and the number of plots is 49.
This shows that as the input parameters are set strictly, the standard deviation and number of Calculation day decrease. For example, both SD and the number of Calculation day are the lowest when the input parameters are set to 700 W/m2 and 70 times. In order to simultaneously achieve stable results and early detection, the optimum input parameters were determined. In this study, a standard deviation of 0.05 was inserted statistically to guarantee the correct answer rate while detecting a 10% decrease (red line).
Accordingly, the input parameters were set as 500 W/m2 and 50 times.
Table 1. Input and output parameters for optimization.
Figure 5. Average output in each input parameter and line to guarantee the correct answer rate.
4.2. Effectiveness of the Proposed Method
Effectiveness of proposed method was evaluated for the following cases.
Case (a): No categorization, i.e. the days were not categorized as Calculation day and exclusion day.
Case (b): Existing method, i.e. the maximum values at each time were extracted from the 30-day period  .
Case (c): The proposed method, i.e. the daily output power and satellite-derived irradiance were used if the day was determined to be a Calculation day.
Compared with Case (a), the method proposed in this paper, Case (c), shows better results from the viewpoint of stability, and consequently, in terms of accuracy. Compared with Case (b), the proposed method, Case (c), shows better results from the viewpoint of early detection. This is because Case (c) has 156 plots, i.e., fault detection can be conducted once in 2 or 3 days on an average, while Case (b) requires 30 days to calculate one point. Particularly in Case (b), the mean value exceeded 1, which might have been caused by the overestimation of satellite-based irradiance. This overestimation may be related to extracting the
Figure 6. Annual result of evaluation index in the three cases.
Table 2. Mean value and standard deviation of the annual evaluation index.
maximum point each time. With regard to SD, it was high in Case (a), and sufficiently low to detect a decrease accurately in Case (b) and Case (c).
Accuracy rate for the decrease in simulation was calculated depending on the decrease rate in each method. Decreased power was calculated by multiplying the measured power by the simulated decrease rate. The accuracy rate was calculated by dividing the number of correct days by the total number of days. These are given by Equations (4) and (5).
where PD is the power decrease by simulation, PM is the power measured from the households, and DR is the simulated decrease rate, such as 20%. AR is the accuracy rate, NCD is the number of days the diagnosis was conducted correctly, and NTD is the total number of days the diagnosis was conducted. Both false positive and false negative are included while calculating the AR. As shown in Figure 7, the AR is calculated by setting the DR as 0% to 50% for every 1%, and the threshold as 0.9 in each method.
A difference of 100% in the accuracy rate to the left side of the 10% decrease rate indicates a false positive, while this difference to the right side of the 10% decrease rate indicates a false negative in Figure 7. There is a high probability of false positive and negative in Case (a) because the variance in the evaluation index was high, as shown in Figure 5. Moreover, there is a high probability of false negative in Case (b), because the mean value of the evaluation index was slightly over 1.0. All the three cases share a low point near the 10% decrease rate due to the threshold is set to 0.9. However, this problem can be improved by considering the variance of evaluation index to determine whether the system shows a decrease in output power in the confidence interval.
Figure 8 shows the result of the proposed method considering the confidence intervals, such as ±σ and ±2σ.
Considering the confidence interval, the accuracy rate was improved in the vicinity of 10%. The confidence interval should be determined depending on
Figure 7. Accuracy rate in each method.
Figure 8. Accuracy rate in considering the variance of evaluation index.
whether to avoid the false positive or the false negative.
4.3. Effectiveness of Fault Classification
In order to classify the cause of decrease in power into failure and partial shade, data extraction was considered. It is likely that data group extraction using cluster analysis shows a more stable result than data point extraction because the calculation of evaluation index includes multiple points. On the contrary, calculation using a single datum is likely to provide unstable results because individual instantaneous data include overestimation and underestimation as stated in Section 2. In the data group extraction, it is likely that the fuzzy c-means (FCM) method provides a more stable result than K-means because it is more adjustable to the membership function. For example, Figure 9 shows a comparison of the result of FCM and the result without extraction.
Although the system did not include faults, the evaluation index decreased during a certain period in the raw case. This was caused by partial shade. On the contrary, a stable result was obtained by using the FCM method even when the system was affected by partial shade. This suggests that fault classification can be carried out with data group extraction by FCM. Three methods, i.e. maximum point extraction, data group extraction by K-means, and group extraction by FCM, were compared with respect to standard deviation, as shown in Figure 10. As a setting of the number of clusters, two and three clusters were examined in K-means. Various settings were examined in FCM because the two parameters of fuzzifier and condition of the membership degree were changed. In FCM, approximately 100 cases were calculated in total.
This result shows that SD did not improve by maximum point extraction. Data group extraction by cluster analysis shows significantly better results than those of data maximum point extraction. FCM shows better results than K-means if the two parameters of fuzzifier and condition of the membership degree are set appropriately. However, these results are limited to the SD in only one system, and it is necessary to calculate classification accuracy and to consider application to other systems.
Figure 9. Comparison of evaluation index obtained without extracting upper data and extracting by FCM.
Figure 10. Standard deviation of each method.
This paper presents a method of detecting a decrease in the output of PV systems with satellite-derived irradiance. Depending on the climatic conditions, the day was categorized as a Calculation day or an exclusion day. Parameters to categorize the day were optimized by testing various values. By determining the appropriate day to calculate using the optimized parameters, a stable result was obtained compared with calculating every day. While SD was 0.125 for evaluating every day, it was 0.034 in the proposed method. Compared to the method in the previous study, our method requires a shorter time to detect a decrease in the output power of the PV system because only one day’s data are required. In the proposed method, the accuracy rate is nearly 100%, even when detecting significant decrease rates, such as 20% and 30%. In addition, the probability of false positive and false negative is reduced by considering the confidence intervals, such as ±2σ. In addition, three extraction methods to classify the faults were compared. Among these, the FCM method with adequate parameters showed the best results.
This paper is based on the results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO). We also wish to thank Dr. Takashi Oozeki of the National Institute of Advanced Industrial Science and Technology (AIST) for technical support.