differential orthorectification technique. This technique requires the elevation data to be available in advance. Hence, the geo-referenced DSM should be derived photogrammetrically for the study area before executing the orthorectification process. The generated orthophoto will be used as an external reference data source to validate and assess the quality of the geo-referenced building roof objects. This validation is proposed to visually inspect geo-referenced building roofs individually (object by object) in order to count the correctly geo-referenced objects (≥90% overlap with the correct rooftop). This is because the traditional orthorectification process does not work perfectly in dense urban areas. Although true orthorectification overcomes the limitation of the traditional orthorectification process, the generation of a true orthophoto is expensive, time consuming, and difficult to achieve  .
6. Data, Results, and Discussion
The results of the developed DECR method and its demonstrated applicability in EBD application are presented and discussed in this section. The section organization is based on the sequence of the proposed validation procedures described in Section 4 and Section 5.2 for the DECR and the EBD methods respectively. The test data used in the experimentation are described in the following subsection.
6.1. Dataset and Study Area
The optical data used in this research are a subset of five stereo VHR images acquired by the WorldView-2 linear sensor with push-broom scanning mode. These MVIS images are available along with their sensor model information. Each of the five MIS-VHR satellite images has eight spectral bands of 1000 × 1000 pixels and a panchromatic band of 4000 × 4000 pixels with 2 m and 0.5 m resolutions, respectively.
The VHR data were acquired in 2010 over a dense urban area of Rio de Janeiro, Brazil. The imaged area is of a modern city and has many buildings with different shapes and sizes. Most of the buildings are high-rise elevations where building lean and façades are prominent. Figure 6 shows the selected reference stereo pair (i.e., the reference image and its stereo mate).
6.2. Results of DECR Method
6.2.1. Epipolarity and Inter-Proportionality Result
Based on the developed DECR method, the available VHR images of the test data are re-projected onto two horizontal planes at the average-terrain elevation (i.e., Zavg = 10 m) and the zero elevation (i.e., Z0 = 0 m). To simplify the calculations, the resampling of the re-projected images was made to be equivalent to 1 m2 per pixel. After that, the offset information in the X and Y direction of the object-space coordinate system was calculated for each MIS-VHR image, as shown in Table 1.
Figure 6. The reference stereo pair of the MVIS-VHR satellite images. (a) The reference image (I-2); (b) the stereo mate (I-3).
Table 1. The calculated shift information required for the DECR methoda.
a. Thevalues are in meter (ground pixel size = 1 m).
To confirm the epipolarity condition of the epipolar rectified images, the SIFT matching technique was executed to generate a set of 35 matching points of good distribution. Then the differences in the matching location were calculated between the point pairs to assess the quality of the epipolarity condition. The RMSE value of these differences was found to be 0.29 pixels with a Std. value of 0.23 pixels. These values validate the approximation of the epipolar curves and indicate the success of the epipolar rectification process for the employed MVIS-VHR satellite images.
Table 1 shows the minimum, maximum, and the average shift values in the X and Y directions respectively. It can be noted that the variation of the values is very small (<1dm) due, mostly, to the small size of the test area. In this case the average values can be used as a generalized value for the whole test area―to simplify the calculations―instead of using the individual values of each ground pixel which is required to overcome any factors that prevents the disparity inter- proportionality including the shifts and drifts in the trajectory of satellite sensors.
To align the rows of the images with the epipolar direction, the rotation angle of the epipolar direction was determined by connecting a distinct elevated point to its corresponding matching points and then determining the epipolar angle relative to the ground X-axis as described in  . Thus, for this dataset, the rotation to the epipolar direction was found to be 76.4231 degrees.
To construct the epipolar images and keep their inter-relationships, the projected images were rotated using the calculated epipolar angle. Additionally, the calculated offset data were rotated based on the same angle to find the offsets in the epipolar direction and its perpendicular (i.e., ΔX', ΔY'). The offset values in this direction were all found to be less than 1 pixel (or 1 meter) with a maximum range of about 0.75 pixels. This indicates quantitatively the validation of the computed epipolar angle, the calculated offsets in the epipolar direction, and the approximated epipolar lines of the constructed epipolar images. This validation is demonstrated visually in Figure 7. The same point and its corresponding points lie on the same horizontal line in all epipolar images.
To validate the inter-proportionality of the disparity values among all of the epipolar images, the validation test described in Figure 4 was implemented. Figure 8 is constructed based on Equation (5). The figure proves the validity of the disparity proportionality among all epipolar images. Two straight lines of two different points indicate the success of the constructed disparity proportionality. This validation confirms the proportionality of the disparity values with their corresponding elevations.
6.2.2. Calculated Elevations Results
Since the scale-relationship of the disparity values was achieved and validated, supplementary surface disparity data (SDMs) were generated and fused using the 3D-median filter as described in Section 3.4. These supplementary disparity data were used to fill disparity gaps of the reference disparity map. By fusing the generated SDMs with the reference one, the SDM of the selected reference pair (i.e., Ep-2 & Ep-3) was enriched and enhanced. Figure 9 shows different SDMs extracted from different epipolar stereo pairs which are fused together to generate a more accurate and enriched SDM. The SDM achieved has an identical representation of its corresponding LoS-DSM.
As stated previously, the offset variation is almost negligible due to the small size of the test area. Thus, to simplify the calculations, the average offset of each image in the epipolar direction was used to calculate the DEF formula―as in Equation (4)―instead of the offset value for each individual pixel which must be considered in the case of full scenes and hilly areas.
Based on Equation (4), the scale value calculated for the selected reference stereo pair (I2-I3) is found to be −1.912. Hence, the elevation data corresponding to the enriched SDM of the reference stereo domain were directly generated for the selected reference stereo pair (I2-I3). Since these elevations are measured from the projection plane (Zavg), the height above the datum of that projection plane (ΔZ = 10 m) is added to have the elevation referenced to the datum. Hence,
Figure 7. Epipolarity validation. Any point and its corresponding ones lie on the same horizontal line in all epipolar images.
Figure 8. Disparity inter-proportionality validation among all epipolar images.
Figure 9. The fused and enhanced SDM generated from different epipolar stereo pairs based on the scale transformation formula derived in the DPP method. Unlike the rest of the pairs, the pair Ep-2& Ep-1 has negative values because the stereo mate is before the reference image in the sequence of acquisition
the disparity-based derived elevations are computed as Elv. = 10 − 1.912 × DI2,I3. The resulting elevation data represent the disparity-based LoS-DSM co-regis- tered to the reference image with pixel-level accuracy as per the disparity definition.
For the evaluation of the disparity-based elevation calculation, a set of tie points that represent the centres of different flat building-rooftops were matched manually in order to correctly triangulate their 3D ground coordinates. The resulting ground elevations were then compared against their corresponding values generated based on the DECR method. The photogrammetric elevations were calculated using industry leading commercial photogrammetric software (PCI Geomatica, ver. 2015).
Based on a manual selection of the building roof elevations in the scene, the RMSE and Std. were calculated and found to be 1.54 m (1.5 pixels) with a Std. value of 1.06m (1 pixel). The RMSE value is reasonable because it is less than the pixel-level threshold value, used to identify the inconsistency and outliers among the corresponding supplementary disparity values, generated from different epipolar stereo pairs, before fusing them. Moreover, this threshold is almost double the Std. value. Hence, this indicates the highly acceptable precision of the achieved RMSE value. Therefore, the developed disparity-based approach is able to produce elevation data comparable in accuracy to that derived rigorously based on the traditional photogrammetric approaches using industry leading commercial software.
6.2.3. Co-Registered LoS-DSM Result
Based on the DECR method, the resulting LoS-DSM co-registered to the selected reference image needs to be validated to identify the elevation inconsistency through visual inspection. Figure 10 shows an isometric view of the generated LoS-DSM and the rendered representation of the co-registered reference image. While Figure 10(a) shows clearly the elevation consistency for each building roof, Figure 10(b) confirms the predefined co-registration accuracy with the reference optical data. The realistic 3D representation of the dense urban area is attributed to the successful disparity-based image-elevation data co-registration achieved using the developed DECR method.
Figure 10. The reference stereo pair of the MVIS-VHR satellite images. (a) The reference image (I-2); (b) the stereo mate (I-3).
6.3. Results of DECR Method
To demonstrate the applicability of the developed DECR method, an elevation-based building detection application was implemented to map the building-roof objects based on the accuracy of the elevation derivation, terrain elevation minimization, and the pixel-level co-registration with the reference optical image.
The building detection is mainly based on the elevation information. Because trees produce false detection results, a mask was generated based on the NDVI value to exclude all of the vegetation objects. The threshold value of this index was selected empirically and was equal to 0.3. It is worth mentioning that this NDVI-based vegetation removal may affect negatively the detection result in the cases of roof gardens or buildings with high NDVI values. These two cases pose a limitation to this vegetation removal technique. However, these cases rarely appear in most urban areas.
Since the reference image was selected to be off-nadir over a challenging and dense urban area, the buildings’ façades are prominent. To exclude these confusing building objects, the Left-Right occlusion detection technique was executed to identify the hidden areas which mainly represent in our case the building façades. Figure 11 illustrates an example of the performance of this technique (Figure 11(c)). The whole façade bitmap generated for the selected epipolar stereo images (Ep-2 & Ep-3) is provided in Figure 11(d). This mask is used to enhance the building detection results by removing the building sides and highlighting the building rooftop objects only.
The intermediate detection results are provided in Figure 12. The co-regis- tered optical and elevation datasets are shown in Figure 12(a) and Figure 12(b) respectively. Figure 12(c) demonstrates the detected off-terrain objects based on a thresholding operation. The threshold value was very close to the selected terrain average elevation (Zavg) because the elevations were referenced to the datum. This indicates the success of the terrain variation minimization described in the developed DECR method. The detection result of Figure 12(c) includes the tree objects which are filtered out in Figure 12(d) based on an NDVI bitmap
(a) (b) (c)(d)
Figure 11. The generation of the occlusion map for the reference stereo images (I-2 & I-3). (a) A subset from the epipolar reference image (Ep-2); (b) a subset from the epipolar stereo image (Ep-3); (c) the detected occlusions in the reference image; (d) the generated occlusion map for the epipolar reference image (Ep2).
(a) (b) (c) (d)
Figure 12. The intermediate building detection results. (a) Epipolar off-nadir VHR reference image (Ep-2); (b) disparity-based co-registered LoS-DSM using DECR method; (c) detected off-terrain objects based on a thresholding operation of a value close to Zavg; (d) resulting objects after suppressing vegetation objects based on an NDVI bitmap; (e) resulting objects after removing the building façades. (f) The manually generated reference data for comparison.
produced based on the empirically selected threshold value. This result is enhanced further by applying the Left-Right-based occlusion map of Figure 11(d) generated to exclude the building façades. Some post-processing steps were applied to enhance the detection result. These steps include merging adjacent objects and removing isolated ones of small areas since they usually represent noise. The final enhanced detection result is shown in Figure 12(e). The reference data used for evaluating the detection results are provided in Figure 12(f). These data were generated manually. The quantitative evaluation of the final detection result relative to the reference data is provided in Table 2.
Based on the performance measure provided in Table 2, the detection was highly successful. This high quality detection is due to the incorporation of the elevation information. This information is a critical detection component for the
Table 2. Performance evaluation of the building detection result.
buildings since they are inherently elevated objects.
The high correctness value of 95% is attributed mainly to the high accuracy of the derived elevation data (based on a fused supplementary disparity maps) used for the detection and the accurate co-registration of these elevation data with the reference image. This performance indicates the success of the developed DECR method. Furthermore, the quality of the elevation data generation and co-regis- tration is reflected also in the high completeness of 96%. However, the role of incorporating the façades bitmap based on the Left-Right checking technique reduced the false detection tremendously, which resulted in a higher completeness measure value.
As a combined indicator of both the correctness and completeness performance measures, the overall-quality (OQ) measure confirms the high quality of the building detection results. This quality can be seen by comparing visually the reference data in Figure 12(f) to the final detection results in Figure 12(e). A building detection result in off-nadir VHR imagery of 92% overall quality over a challenging dense urban area is indeed a significant success that demonstrates the applicability of the developed DECR method in an EBD application.
6.4. Results EBD Map Geo-Referencing
After mapping the building-roof objects and evaluating the detection performance, the resulting building objects must be geo-referenced to their correct ground locations in order to be ready for incorporation in GIS systems and integration with other GIS layers. The geo-referencing process described in Section 5.1.5 was implemented to remove the perspective effects of the off-nadir VHR image and then apply the geo-referencing. The correct geo-location in the epipolar direction was calculated based on Equation (5) using the derived elevations, referenced from the projection plane (Zavg), at the RP point of each building object. For the selected reference image, the values used for ΔZ is10m and for ΔX’ is −6.7352 m.
The achieved geo-referencing result was evaluated as described in Section 5.2.2 based on an orthoimage. Therefore, a dense geo-referenced DSM was derived photogrammetrically based on the traditional approach and then used to create an orthoimage for the study area. This orthoimage was used as an external reference to assess the accuracy of the building map geo-referencing process.
Figure 13(a) illustrates the detection result in the epipolar image domain overlaid on the epipolar reference image (I-2). Figure 13(b) shows the geo-ref- erencing result overlaid on the generated orthoimage. Figure 13(c) shows the enhanced building detection results geo-referenced to the correct object-space locations and prepared as a GIS layer.
(a) (b) (c)
Figure 13. The result of the building detection in both the epipolar image domain and the geo-referenced ground domain. (a) The detected building objects in the epipolar reference image domain; (b) geo-referenced building objects overlaid on the generated orthoimage; (c) the resulting GIS layer.
After a qualitative assessment of the generated orthoimage, it is found that 98% of the building roofs were geo-referenced correctly. This confirms and validates further the derived elevation data based on the generated disparity information. In contrast, 2% (3 out of 165 objects) of the geo-referenced building objects were found lying on the incorrect geo-location. The reason for that is the incorrect elevation used to calculate this geo-location in the epipolar direction. This elevation was extracted at the RP point of the building object as described earlier. For non-flat building roofs, the location of the RP selected in our study may not be at the elevation value that leads to the correct geo-reference location. These cases are very few and they can be easily identified and corrected manually.
Figure 14 illustrates four cases of the achieved geo-referencing results. On the one hand, the upper row of this figure shows the detection result in the epipolar image domain (Ep-2) along with the correct geo-location of the detected building object in the epipolar direction. The shift of the detected objects to the correct location in the epipolar domain is equal to the distance between the RP point of the detected building (represented by a triangle in Figure 14) and correct building object location (represented by a square in Figure 14) as indicated in the same figure.
On the other hand, the lower row of Figure 14 shows the geo-referencing result overlaid on the generated orthoimage. Based on the visual assessment, the geo-referencing result is highly successful. For Figure 14(d2), despite the misalignment between the building roof in the orthoimage and the corresponding geo-referenced object, the location of the geo-referenced roof object is correct. This is because the lean of the high-rise building (more than 135 m above the local ground) was not completely eliminated from the orthoimage. Hence, the source of this misalignment is in the reference data, not the geo-referenced objects. Figure 14 indicates the success of the proposed geo-referencing process. The final geo-referencing result was processed to be directly integrated into a GIS system. At this stage, the cycle of the information extraction and mapping is
Figure 14. A few examples of the geo-referenced roof objects for visual assessment. The upper row shows the detected roof objects along with its RP location. The same row shows the calculated correct location of the object to be moved to it along with its new RP point.
completed for the application of building detection in off-nadir VHR images.
In this study, the problem of optical-elevation data co-registration is addressed. The introduced solution is based on generating disparity-based elevation data co-registered with one of the employed VHR images. The co-registration is achieved with pixel-level accuracy using an improved approach for developing a LoS-DSM from an enriched disparity map. This map is generated from different MIS-VHR satellite images by fusing supplementary disparity data derived from different stereo-pair combinations as described in our recently introduced DPP method. Then, the LoS-DSM elevations are derived efficiently based on the DEF (Disparity-to-ElevationFormula) of the DECR method developed in this study.
The developed DECR method, which is an extension of DPP method, is based on establishing a scale-based relationship between the disparity and elevation data in the object space. The core concept of DECR is to re-project the MIS- VHR images onto an object-space horizontal plane at the average terrain elevation of the imaged area. This re-projection is mainly to build linear proportionality among the relevant object-space disparities and with their corresponding elevations. This property allows generating supplementary disparity data to be used for filling the disparity gaps of the reference SDM. Additionally, the property allows the direct derivation of object-space elevations from the enriched (gap-free) SDM. These derived disparity-based elevations represent the LoS-DSM co-registered to the reference optical image selected. As an extension of the DPP method, the DECR method includes analytical derivation of two formulas: (1) Disparity-to-Elevation Formula (DEF) that provides the elevation values and (2) the Correct Distance Formula (CDF) that provides the shift to the correct geo-referenced location of the object-space pixels in the epipolar direction.
The developed DECR method was successfully validated in terms of the epipolarity condition of the projected images, the disparity proportionality among the projected images, the elevation accuracy derived based on the generated disparity data, and the elevation co-registration accuracy to one of the employed optical images.
For applicability demonstration purposes, an EBD procedure was developed and implemented to detect building roofs using the achieved disparity-based LoS-DSM. The detection result for the off-terrain objects was achieved by a threshold value close to zero above the selected average terrain level. This represents an advantage over the original LoS-DSM algorithm. When the detection was evaluated, the result was found to be very successful based on the traditional performance measures (Completeness, Correctness, and Overall Quality). This indicates an effective minimization of the terrain-relief variation which reduces the need for an elevation normalization algorithm in dense urban areas of moderate terrain variation.
The overall quality measure was found to be 92%, proving the successful accuracy of both disparity-based elevation data calculation and the optical-eleva- tion data co-registration. While the 95% correct detection is attributed to the high quality of the elevation data generation (based on enriched and enhanced disparity maps) and incorporation, the 96% complete detection resulted from the refined detection accuracy based on the occlusion and façades bitmap generated using the Left-Right checking technique.
In addition to the high detection quality achieved, the generated building-roof objects were geo-referenced to their correct object-space location to allow direct integration with existing GIS layers. The geo-referencing process is based on moving the detected building roofs to the correct ortho-location in the epipolar domain of the reference image. The calculated locations were validated using an orthoimage generated for the study area using a commercial photogrammetric software package (PCI Geomatica). The evaluation result was almost 100% based on the visual assessment of the geo-referenced roof objects of which ≥90% overlap with the correct ground location of the corresponding building roofs in the orthoimage. This significantly successful geo-referencing result of the generated building roof map demonstrates the accuracy of the derived disparity-based elevation data. Furthermore, it allows a direct integration with the existing layers of the GIS systems.
The identified limitation in the developed DECR method is its applicability to only the MVIS-VHR images acquired by push-broom linear sensors. Moreover, the terrain variation minimization by re-projection onto the average terrain level may not be sufficient in hilly and mountainous areas. For the developed EBD procedure, the identified limitation is the vegetation detection and removal based on NDVI-based threshold. This vegetation index may produce some errors in the cases of roof gardens and buildings with high NDVI values.
Therefore, future work should address these limitations and extend the DECR technique to be applicable in a tile-based approach in order to work even on hilly and large areas.
This research is funded in part by the Libyan Ministry of Higher Education and Research (LMHER) and the Canadian Research Chair (CRC) program. The optical data used in this research are provided by Digital Globe for the IEEE-IGARSS 2011’s Data Fusion Contest. They also greatly appreciate thevaluable comments from the anonymous reviewers and theeditor for improving this paper.