Vegetation regulates biogeochemical cycles, energy balance and climate, ameliorates soils, and serves oxygen, energy and habitat to animals. Physiognomy (structural characteristics―tree, shrub, herbaceous; or leaf characteristics―ever- green or deciduous, needle-leaved or broad-leaved  ) based vegetation classification is relevant to the characterization and monitoring of vegetation dynamics. In spite of numerous land use/cover mappings at local, regional or global scale, vegetation physiognomic mapping is limited. Moderate Resolution Imaging Spectroradiometer (MODIS) based Land Cover Type product (MCD12Q1;  ) is one of the most recently available global land cover product from which vegetation physiognomic information can be obtained. The MCD12Q1 product classifies land use/cover types using an ensemble based supervised classification algorithm (decision trees) complemented by the training data from 1860 sites distributed across the Earth’s land areas  . However, to the best of our knowledge, accuracy and applicability of the MCD12Q1 product in terms of physiognomy based vegetation types have not been assessed so far in Japan.
Studies in composition of forests and vegetation in Japan were initiated in the early Meiji Era (1868-1915), and the first vegetation map was prepared in 1900 based on field survey of dominant species  . Kira et al.   developed the Warmth Index (WI) defined as the annual sum of monthly mean temperatures above 5˚C, and described the forest zones of Japan correlating with temperature. Based on the WI values, Japan is divided into five climatic zones: warm-temperate, semi-temperate, cool-temperate, subalpine, and alpine  . The vegetation maps were further elaborated by stressing physiognomy, species composition, plant sociology, and succession by many researchers    . Ohsawa    analyzed two-dimensional spatial distribution of Japanese vegetation utilizing latitude and altitude data. Brief description of the natural and semi-natural vegetation of Japan is provided by Numata et al.  . Following the deformation or destruction of the original or natural vegetation during rapid industrialization and urbanization by human interferences, wise land use and re-vegetation planning were began, and national parks and nature reserves were established by realizing the need to conserve remaining fragments of climax forests   . Himiyama et al.  produced the original Land Use Information System (LUIS) maps with 24 land use/cover classes. Himiyama  analyzed land use/cover changes in Japan over the last hundred years, and projected that paddy field is the largest type of land use that suffers from urban expansion by 2020s in central Japan. Hara et al.  developed a model, called the Landscape Transformation Sere, for interpreting changes in land use/cover patterns and vegetation as caused by increasing levels of human activities through a series of stages. Harada et al.  digitized the LUIS maps available for 1900, 1950, and 1985; and analyzed the land use/cover changes between 1900 and 1985. Further land use/cover mapping of year 2001 using MODIS data showed that overall forest cover increased slightly from 72.1% in 1900 to 76.9% in 2001; however, in many areas, the climax vegetation was replaced by timber plantations. In their study, labeling and mapping of the vegetation types were conducted by using clusters obtained from the Iterative Self-Organizing Data Analysis Technique based unsupervised classification method. Nevertheless, labelling the resulted clusters into vegetation physiognomic types through visual interpretation of the very-high-resolution images and/or expert knowledge is a difficult and time-consuming task.
The availability of time-series of surface reflectance data from the MODIS onboard the Terra and Aqua satellites provides a unique opportunity for monitoring phenology of vegetation, and thereof mapping of vegetation physiognomic types. This study utilized the MODIS data of year 2013 for the production of vegetation physiognomic map in Japan. The objective of the research was to produce an accurate vegetation physiognomic map using an automated machine learning approach with the support of reference data. The resulted vegetation physiognomic map was compared to the MCD12Q1 product over Japan. The accuracy between newly produced map and MCD12Q1 product is discussed.
2.1. Study Area
This research covers the whole national land areas in Japan. Vegetation is an integral part of the Japanese landscape; more than two-thirds of the national land is covered by forests. Japan has high species diversity; approximately 7000 floral species have been recorded, and around 2900 floral species are endemic to Japan. The flora of Japan is also characterized by a richness of endemic families and genera. The Shino-Japanese region, which covers almost all of the Japanese archipelago, constitutes 15 endemic families and more than 300 endemic genera while none of the floristic regions in the world constitutes more than five endemic families   .
The climate of Japan is mostly temperate; arctic and subtropical climates are seasonally found in northern and southwestern Japan respectively. Japan is under the influence of monsoon climate; monsoon in summer brings a large amount of rain in the southeast side, whereas the monsoon in winter brings a large amount of snowfall in the northwest side and Hokkaido. The annual mean temperature is from 0˚C (central Hokkaido) to 18˚C (southern Kyushu), and annual precipitation ranges from 600 mm to 4000 mm. Topographically, 77% land falls between 0 and 700 m elevation; whereas 5% is highlands over 1300 m  . A great variety of vegetation has flourished with diverse ranges of climates, temperatures and precipitation; and wider topographic variation.
Vegetation in Japan has been subject to severe disturbance due to rapid industrialization and urbanization over the past couple of centuries. Irrigated rice farming began about 3000 years ago, and since then many old growth forests especially in the warm-temperate and mid-temperate zones have been converted into secondary woodlands and forestry plantations   . Japanese vegetation are also prone to damages from earthquakes, tsunamis, and volcanoes. Mapping and long-term monitoring of vegetation are necessary to promote the conservation of biological diversity and ecosystem services.
2.2. Preparation of Reference Data
Based on the physiognomic characteristics, vegetation is classified into eight classes in the research: Evergreen Coniferous Forest, Evergreen Broadleaf Forest, Deciduous Coniferous Forest, Deciduous Broadleaf Forest, Shrubland, Arable Land, Herbaceous, and Non-vegetation. The classification scheme used in the research is described in Table 1.
In Japan, terrestrial vegetation has been surveying continuously since 1973. The procedure of the vegetation survey involves field inspection of the unique vegetation types, and record of plant community types along with the geo-loca- tion points. The plant communities are classified by experts according to the association of vegetation―the diagnostic/dominant species occurrence in the uppermost (and understory) stratum. A lookup table was prepared for re-classifi- cation of the plant community types into physiognomy based types by studying physiognomic characteristics of all plant communities. The geo-location points were further verified to represent large homogenous (at least a single MODIS pixel size) areas using Google Earth based time-lapse images. Finally, for each physiognomic class, 300 geolocation points distributed over the Japan were confirmed and used as the reference data.
2.3. Processing of Satellite Data
Terra/Aqua satellite based MODIS Surface Reflectance 8-Day Level 3 Global 500 m data (MOD09A1 and MYD09A1) available from the United States Geological Survey (USGS) over Japan in year 2013 were processed and used in the research. The MOD09A1 and MYD09A1 products provide an estimate of the surface
Table 1. Vegetation physiognomic classification scheme used in the research.
spectral reflectance of bands 1 - 7 (Red, Near Infrared, Blue, Green, Mid Infrared, Shortwave Infrared 1, and Shortwave Infrared 2) as it would be measured at ground level in the absence of atmospheric scattering or absorption. Only highest quality surface reflectance datasets were used by masking out the pixels affected by clouds, cloud shadows, cirrus, and large solar zenith angles using separate quality band descriptions available in the dataset. Three spectral indices: Normalized Difference Vegetation Index  , Superfine Water Index  , and Urban Built-up Index  were also calculated for each scene. The 8-day data containing surface reflectance (7 bands) and three spectral indices were composited using monthly and percentile based techniques. Multiple percentiles (0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100) and monthly median composites (January to December) were calculated pixel by pixel for each dataset. In addition, land surface slope data were prepared from 30 m resolution Shuttle Radar Topography Mission based Digital Terrain Elevation Data available from the USGS; and resampled into the MODIS pixel size (500 m). Altogether, 231 features (input layers) prepared and used for machine learning and mapping of the vegetation physiognomic types. The input features are described in Table 2.
2.4. Mapping and Comparative Analysis
The mapping framework developed by previous study  suitable for large- scale land use/cover mapping was adopted to produce 500 m resolution vegetation physiognomic map of year 2013 in Japan. This mapping framework automatically selects the optimum features, a set of lowest number of input features yielding highest kappa coefficient, for the given reference data and uses the Random Forests  based supervised classification model for the production of land use/cover map. The retrieval of the optimum features does not only select the best features required for discriminating the classes, but also reduces the computation time and efforts  . Due to faster computation power, Random Forests based supervised classification method was adopted from the viewpoint of nationwide mapping which involves a large volume of data. Recently, researches using Random Forests classifier are growing rapidly for remote sensing applications     . Finally, the resulted vegetation physiognomic map was compared to the MCD12Q1 product. The MCD12Q1 product provides 500 m resolution global land use/cover maps using five kinds of classification schemes from 2001 to 2013 annually. In this research, International Geosphere- Biosphere Programme (IGBP) layer of the MCD12Q1 Version 5.1 product was
Table 2. Description of the features used in the research.
used which consists of 17 land cover classes. It is available in the Hierarchical Data Format (HDF) with sinusoidal projection system. For comparison, it was re-projected into the Geographical Coordinate System (GCS) and remapped according to the classification scheme described in Table 1.
3. Results and Discussion
3.1. Production of JpVP-500 Map
The 500 m resolution vegetation physiognomic map of year 2013 produced in the research is displayed in Figure 1. This map was produced by establishing the
Figure 1. Nationwide vegetation physiognomic map of year 2013 produced through the research: (a) Display over the national territory, (b) Zoomed in over the black polygon region in (a). The national boundary is based on Global Administrative Areas database (GADM) version 2.8, Nov. 2015.
Random Forests model based on 75% reference point data prepared in the research and using 148 optimum features. The automated machine learning approach based resulted map did not require any post-editing.
3.2. Comparison to MCD12Q1
The resulted map was compared to the MCD12Q1 product of year 2013. Direct comparison between the resulted map and MCD12Q1 product is not possible due to different legends used. Therefore, the MCD12Q1 product was remapped according to the definitions used in our map as far as possible. The remapping procedure of the MCD12Q1 product is explained in Table 3.
The remapped vegetation physiognomic information extracted from the MCD12Q1 product is plotted in Figure 2. The comparison of the Figure 1 to Figure 2 clearly shows a huge difference. Almost all vegetation types in Japan are simply classified as mixed forests by the MCD12Q1 product. Not only forests but also large areas of scrublands are also misclassified as mixed forests by the MCD12Q1 product.
3.3. Validation Results
The performance of the resulted vegetation physiognomic map was assessed by computing the accuracy metrics: overall accuracy and kappa coefficient by using
Table 3. Remapping of the MCD12Q1 product.
Figure 2. Vegetation physiognomic map extracted from the MCD12Q1 product of year 2013: (a) Display over the national territory, (b) Zoomed in over the black polygon region in (a).
25% reference point data (test data). The overall accuracy―sum of true positives and true negatives divided by number of validation points―measures correctness of the classification. Kappa coefficient measures inter-rater agreement by counting the proportion of instances that predictions agreed with the validation data (observed agreement) after adjusting for the proportion of agreements taking place by chance (expected agreement)  .
Based on the reference point data prepared in the research, accuracy of the MCD12Q1 product was assessed. For this purpose, all 2400 reference point data were used. Our map separates the grasslands into herbaceous (natural grasslands) and arable (cultivated pastures), but puts the cultivated pastures and croplands into a single arable class. On the other hand, the MCD12Q1 product does not separate permanent wetlands into mangrove trees and herbaceous marshlands; whereas our map separates wetlands into herbaceous (marshlands) and mangroves (Evergreen Broadleaf Forest). Therefore, quantitative validation of the MCD12Q1 product was done by excluding the unmatched classes (grasslands, croplands, and wetlands). The performance of two maps are summarized in Table 4.
The overall accuracy (Kappa coefficient) calculated for MCD12Q1 product and our map are 0.32 (0.24) and 0.82 (0.79) respectively. The validation results showed that our map performed far better than the MCD12Q1 product. The MCD12Q1 product is basically the land use/cover map which was not targeted solely for the vegetation physiognomic mapping. The reference point data prepared through a careful study of the physiognomic characteristics of the plant communities in Japan has resulted accurate vegetation physiognomic map in the research. The 17-class MCD12Q1product is based on globally distributed 1860 training points; whereas the preparation of 2400 training points only in Japan for the production of 8-class vegetation physiognomic map is very high. The physiognomy of the vegetation is so diverse in the world that it’s hard to classify them merely using 1860 points.
In this research, a rich-feature data exploited with the Random Forests based mapping framework provided reliable classification (Overall accuracy = 0.82, Kappa coefficient = 0.79) of the vegetation physiognomic types in Japan. The comparison of the resulted vegetation physiognomic map to MODIS Land Cover Type product (MCD12Q1) based on the reference data prepared in the research showed a huge difference. Most of the vegetation types in Japan are simply classified as mixed forests by the MCD12Q1 product. The validation results undermine the applicability of the MCD12Q1 product in terms of vegetation physiognomy in Japan, and highlights the possibility of improving the accuracy of the MCD12Q1 product with special focus on reference data. Although this research provided far better classification of the nationwide vegetation physiognomic types compared to the MCD12Q1 product, the classification accuracy may not be sufficient for tracking the vegetation physiognomic changes over the years. Therefore, further research especially on the inter-class discrimination of the vegetation physiognomic types is recommended to increase the classification accuracy.
Table 4. Performance of different vegetation physiognomic maps.
This research was conducted under the Environment Research and Technology Development Fund (1-1405) of the Ministry of Environment, Japan; and the Ministry of Education, Culture, Sports, Science and Technology (MEXT) Japan Grant-In-Aid for Scientific Research (26350403). Authors are grateful to the Bio- diversity Center of Japan, Nature Conservation Bureau, Ministry of the Environ- ment for providing access to the vegetation survey data.
 Friedl, M.A., Sulla-Menashe, D., Tan, B., Schneider, A., Ramankutty, N., Sibley, A. and Huang, X. (2010) MODIS Collection 5 Global Land Cover: Algorithm Refinements and Characterization of New Datasets. Remote Sensing of Environment, 114, 168-182.
 Nogami, M. (1994) Thermal Condition of the Forest Vegetation Zones and Their Potential Distribution under Different Climates in Japan. Japanese Journal of Geography, 103, 886-897.
 Miyawaki, A. and Sasaki, Y. (1985) Floristic Changes in the Castanopsis cuspidata var. Sieboldii-Forest Communities along the Pacific Ocean Coast of the Japanese Islands. Vegetation, 59, 225-234.
 Ohno, K. (1991) A Vegetation-Ecological Approach to the Classification and Evaluation of Potential Natural Vegetation of the Fagetea Crenatae Region in Tohoku (Northern Honshu), Japan. Ecological Research, 6, 29-49.
 Miyawaki, A. and Fujiwara, K. (1988) Vegetation Mapping in Japan. In: Küchler, A.W. and Zonneveld, I.S., Eds., Vegetation Mapping, Springer, Berlin, 427-441.
 Himiyama, Y., Arai, T., Ota, I., Kubo, S., Tamura, T., Nogami, M., Murayama, Y. and Yorifuji, T. (1995) Atlas: Environmental Change in Modern Japan. Asakura Shoten, Tokyo. (In Japanese with English Abstract)
 Himiyama, Y. (1998) Land Use/Cover Changes in Japan: From the Past to the Future. Hydrological Processes, 12, 1995-2001.
 Hara, K., Harada, I., Tomita, M., Short, K., Park, J., Shimojima, H., Fujihara, M., Hirabuki, Y., Hara, M. and Kondoh, A. (2010) Landscape Transformation Sere: In Which Directions Will Our Landscape Move and How Can We Monitor These Changes. In: Burel, F. and Baudry, J., Eds., Landscape Ecology: Methods, Applications and Interdisciplinary Approach, Slovak Academy of Sciences, Bratislava, 165-172.
 Harada, I., Hara, K., Tomita, M., Short, K. and Park, J. (2015) Monitoring Landscape Changes in Japan Using Classification of Modis Data Combined with a Landscape Transformation Sere (LTS) Model. Journal of Landscape Ecology, 7, 23-38.
 Sharma, R., Tateishi, R., Hara, K. and Nguyen, L. (2015) Developing Superfine Water Index (SWI) for Global Water Cover Mapping Using MODIS Data. Remote Sensing, 7, 13807-13841.
 Sharma, R.C., Tateishi, R., Hara, K., Gharechelou, S. and Iizuka, K. (2016) Global Mapping of Urban Built-Up Areas of Year 2014 by Combining MODIS Multispectral Data with VIIRS Nighttime Light Data. International Journal of Digital Earth, 9, 1004-1020.
 Sharma, R., Tateishi, R., Hara, K. and Iizuka, K. (2016) Production of the Japan 30-m Land Cover Map of 2013-2015 Using a Random Forests-Based Feature Optimization Approach. Remote Sensing, 8, 429.
 Rodriguez-Galiano, V.F., Chica-Olmo, M., Abarca-Hernandez, F., Atkinson, P.M. and Jeganathan, C (2012) Random Forest Classification of Mediterranean Land Cover Using Multi-Seasonal Imagery and Multi-Seasonal Texture. Remote Sensing of Environment, 121, 93-107.
 Sharma, R.C., Tateishi, R. and Hara, K. (2016) A Biophysical Image Compositing Technique for the Global-Scale Extraction and Mapping of Barren Lands. ISPRS International Journal of Geo-Information, 5, 225.