Mapping US Urban Extents from MODIS Data Using One-Class Classification Method

Wan, Bo; Guo, Qinghua; Fang, Fang; Su, Yanjun; Wang, Run

doi:10.3390/rs70810143

Open AccessArticle

Mapping US Urban Extents from MODIS Data Using One-Class Classification Method

¹

Faculty of Information Engineering, China University of Geosciences, No. 388 Lumo Road, Wuhan 430074, China

²

Sierra Nevada Research Institute, School of Engineering, University of California at Merced, 5200 North Lake Road, Merced, CA 95343, USA

^*

Author to whom correspondence should be addressed.

Remote Sens. 2015, 7(8), 10143-10163; https://doi.org/10.3390/rs70810143

Submission received: 16 May 2015 / Revised: 16 May 2015 / Accepted: 4 August 2015 / Published: 10 August 2015

Download

Browse Figures

Versions Notes

Abstract

:

Urban areas are one of the most important components of human society. Their extents have been continuously growing during the last few decades. Accurate and timely measurements of the extents of urban areas can help in analyzing population densities and urban sprawls and in studying environmental issues related to urbanization. Urban extents detected from remotely sensed data are usually a by-product of land use classification results, and their interpretation requires a full understanding of land cover types. In this study, for the first time, we mapped urban extents in the continental United States using a novel one-class classification method, i.e., positive and unlabeled learning (PUL), with multi-temporal Moderate Resolution Imaging Spectroradiometer (MODIS) data for the year 2010. The Defense Meteorological Satellite Program Operational Linescan System (DMSP-OLS) night stable light data were used to calibrate the urban extents obtained from the one-class classification scheme. Our results demonstrated the effectiveness of the use of the PUL algorithm in mapping large-scale urban areas from coarse remote-sensing images, for the first time. The total accuracy of mapped urban areas was 92.9% and the kappa coefficient was 0.85. The use of DMSP-OLS night stable light data can significantly reduce false detection rates from bare land and cropland far from cities. Compared with traditional supervised classification methods, the one-class classification scheme can greatly reduce the effort involved in collecting training datasets, without losing predictive accuracy.

Keywords:

urban mapping; remote sensing; MODIS; classification; time series; one-class

Graphical Abstract

1. Introduction

Urban areas, which are characterized by high population densities and extensive human features, are important components of human society, and significantly influence the environment [1]. The continuous acceleration of urbanization has resulted in a series of ecological and environment problems such as the greenhouse effect, urban heat island effect, and air-pollution consequences [2,3,4,5]. Accurate and timely mapping of urban areas is critical for monitoring urbanization, to provide answers to the related wide range of environmental research questions [6,7,8].

Remote-sensing techniques can provide up-to-date land surface measurements on a large spatial scale and are widely used to extract urban areas using image classification methods. Several global land cover maps that include the extents of urban areas have been developed using different types of remotely sensed data and classification techniques. The International Geosphere-Biosphere Program, Data and Information Systems (IGBP-DIS) produced a global land cover map with 17 classes from monthly advanced very-high-resolution radiometer (AVHRR) normalized difference vegetation index (NDVI) composites covering 1992–1993, using a continent-by-continent unsupervised classification method and extensive post-classification stratification [9]. The global land cover dataset (GLC series) was produced by an international partnership of 30 research groups coordinated by the European Commission’s Joint Research Centre in 2000; it provided 22 general land cover types, including artificial surfaces, using unsupervised classifiers based on daily data from the VEGETATION sensor on-board SPOT 4 [10]. This product was further updated in 2005 [11] and 2009 [12]. The Global Rural–Urban Mapping Project (GRUMP) dataset, managed by the Earth Institute, Columbia University, is composed of eight subsets, and its urban areas were identified based on observations of nighttime lights collected by the Department of Defense meteorological satellites over several decades [13]. MODIS Urban Land Cover 500-m [8,14] and the earlier dataset MODIS Urban Land Cover 1-km [15], one of the top-accuracy coarser-resolution urban land cover datasets [16], used a supervised decision tree classification algorithm to extract urban extents from other land covers. In regional land cover maps, the National Land Cover Database (NLCD) that include urban land, created by the Multi-Resolution Land Characteristics (MRLC) Consortium, is the most widely used national land cover product. It released series of datasets in 1992, 2001, 2006 and 2011, serves as the definitive Landsat-based, 30-meter resolution, and was based primarily on a decision-tree classifier [17,18,19,20].

All these current urban-mapping results are highly reliant on a full understanding of all land cover types [21]. For methods using unsupervised classification algorithms, the total number of land cover types needs to be known first. Methods that use supervised classification algorithms require the selection of highly representative and complete training sets. However, because of the lack of understanding of ground cover types, certain land cover types may be missed in the sample collection procedure, which may result in classifier mislabeling of some unknown classes as existing known classes provided by the training set; this decreases the classification accuracy [22]. Moreover, urban areas can sometimes be easily confused with other land cover types [23]. For example, spaces between residential houses are usually covered with grassland and trees, which may result in these areas being inclined to have the spectral characteristics of vegetation instead of built-up areas. It is hard to determine whether or not all types of representative mixed urban samples have been collected, especially in coarser-resolution remote-sensing images.

Urban classification can be referred to as a one-class or True–False classification question, i.e., only urban and non-urban should be used as labels in the classification scheme. One-class classification algorithms aim to extract a specific class from input datasets [24,25,26], and have been used to map certain land cover types. For example, the one-class support vector machine (OCSVM) method [27] was used to extract impervious surfaces from VHR imagery [28,29]. The support vector data description (SVDD) was used to map coastal saltmarsh habitats [30] and fenland [31] from Landsat Enhanced Thematic Mapper Plus (ETM+) imagery. However, these methods can only train the samples from the class of interest, and the free parameters in the models, which are difficult to tune, and the complicated model selection procedures preclude their adoption [32].

The positive and unlabeled learning (PUL) algorithm is a novel one-class classification algorithm proposed by Elkan and Noto [33]. Unlike other one-class classification methods, it only requires the collection of presence and background samples, instead of both presence and absence samples, which further reduces the work involved in sample collection. Moreover, the background/unlabeled samples in PUL can be both positive and negative, which can help to improve the classification accuracy [22]. This has proved to be one of the best one-class methods for classifying land cover based on remote-sensing images [32]. However, to the best of our knowledge, no study has focused on using the one-class classification method to map continental-scale urban areas from coarse-resolution images.

The objective of this study is to apply the PUL algorithm to continental-scale urban land mapping from coarse remote-sensing data for the first time, and to evaluate its effectiveness. We used a small number of urban points and randomly selected background points to map urban areas in contiguous regions of the US using multi-temporal MODIS data; this greatly shortens the sampling cycle and saves resources. The performance of the predicted map was calibrated with the Defense Meteorological Satellite Program Operational Linescan System (DMSP-OLS) data and investigated using multidimensional analysis.

2. Data and Method

2.1. Urban Extent

There are several definitions of the term “urban area” [34,35,36,37,38,39,40,41]. It varies depending on the research perspective, which results in differences in classification products. As our classification represents the physical attributes of urban land rather than definitions based on land use, which refer to the land cover types present, we use the definition proposed by Schneider et al., (2010) [8] to represent our concept of urban land: urban areas are places dominated by the built environment, including all non-vegetative, human-constructed elements such as roads, buildings, and runways (i.e., human-made surfaces), and “dominated” implies coverage of greater than 50% of a given landscape unit, and with a minimum area of 1 km². This definition reflects our methodological approach to mapping urban areas, in terms of the physical attributes and composition of the land cover.

2.2. Dataset

The study area (Figure 1) was the United States (US), including 48 contiguous states plus Washington, DC (federal district), occupying a combined area of 8,080,464 km², which is 1.58% of the total surface area of the Earth (http://en.wikipedia.org/wiki/Contigous_United_State).

Figure 1. Study area consisting of contiguous states in the US, with locations of two examples of tiles of MODIS data. Left box, labeled “Area 1”, covers the H08V05 tile of original MODIS data, and right box, labeled “Area 2”, covers the H11V05 tile.

The Terra MODIS Surface-Reflectance dataset (MOD09A1) for 2010 was used to map the urban extents in this study, with an original spatial resolution of 463 m, covering the study area with fourteen tiles. Seven spectral bands, which were explicitly designed for land applications, were selected, i.e., red (band 1), NIR (band 2), blue (band 3), green (band 4), thermal IR (band 5), and mid-IR (bands 6 and 7). Because of the large geographical span of the study area, the urban mapping was processed tile by tile to minimize the influence of different atmospheric conditions.

Before training and predicting, essential preprocessing of the data was performed. Generally, the vegetation status varies with growth periodicity from season to season [42]. We used the multi-temporal MODIS data for 2010, choosing four scenes of cloudless images for each tile to reflect the four major seasons (Figure 2). The 28 selected bands from the four selected images for each tile were combined as a single dataset and standardized to an interval from 0 to 1.

Figure 2. Examples of selected MODIS data at four different times in Area 2. Location of tile Area 2 is shown in Figure 1. (a) MODIS image from 14 March to 21 March 2010; (b) MODIS image from 4 July to 11 July 2010; (c) MODIS image from 6 September to 13 September 2010; and (d) MODIS image from 9 November to 16 November 2010.

DMSP-OLS nighttime light data were used in post-processing to calibrate the resulting map, with a 1 km spatial resolution. A DMSP-OLS image has digital number (DN) values ranging from 0 to 63 [43]; a higher value represents a higher mean population density [44,45]. The selected DMSP-OLS data were a composite based on multi-temporal archived DMSP-OLS images for the year 2010, which were downloaded from the National Geophysical Data Center (NGDC). In addition, an independent set of urban sample points from 3473 different cities was used to evaluate city omission cases in our results. These cities were generated by population size (>10,000) from a map layer that includes cities and towns in the US (published by the National Atlas of the United States [46]).

2.3. Method

The classification process in this study involved four stages: sampling, training, predicting, and post-processing (Figure 3). (1) Training sets for each tile and testing sets for the whole range were selected. These two datasets were selected randomly and dependently; (2) The classifiers were trained tile by tile with the PUL algorithm; (3) The attribute (urban or non-urban) for each tile was predicted, with its corresponding classifier; (4) All tiles were spliced, the obtained classification map was calibrated with DMSP-OLS nighttime images, and tiny urban blocks were wiped out.

Figure 3. Procedure for proposed urban-mapping scheme using PUL one-class classification algorithm.

2.3.1. Sampling

As already mentioned, in this study, the urban extents were mapped tile by tile. For each tile, we randomly selected 1000 pixels representing urban areas by manual interpretation. These pixels were then overlaid with high-resolution satellite photographs from Google^TM Earth to verify their land use type through manual interpretation. In addition to these samples, 5000 additional background samples were randomly selected, which were distinct from the 1000 urban samples. Because over 85% of tile H08V04 is covered by ocean, it was very difficult to select enough urban samples for training the one-class classifier. This tile was therefore merged with the neighboring tile that shared the longest boundary. Overall, a total of 78,000 training points were selected containing 13,000 urban truth points (Figure 4) and 65,000 background points.

Figure 4. Distribution of selected urban sample points across the US.

The same sampling method was used to establish an independent validation sample set for assessing the accuracy of our results. These samples were distinct from the training samples, and were selected across the range of the study area, and contained 20,000 random points, composed of 8800 urban points and 11,200 non-urban points.

2.3.2. PUL

PUL is a general one-class classifier learning method. In PUL, the target class is defined as positive (y = 1) and all other classes are classified together as negative (y = −1). Let x be an individual sample at pixel level; x is defined as labeled (s = 1) if its class is explicitly known, and unlabeled (s = 0) if its class is unknown. State that y ∈ {1, −1} represents the class of the sample (positive/negative), and s ∈{1, 0} denotes whether or not a sample is assigned a label. The aim is to predict the probability of individual samples being positive by training the function f(x) = p(y = 1|x).

PUL only requires positive (y = 1) and unlabeled (s = 0) samples. It can be inferred that the labeled sample must be positive (y = 1 if s = 1), resulting in p(y = 1|x, s = 1) = 1; the unlabeled sample can be either positive or negative (y = 1 or y = −1 if s = 0); the probability of a negative sample x being labeled is zero, as shown in Equation (1):

p (s = 1 | x, y = - 1) = 0

(1)

Suppose that each positive sample has the same probability of being labeled, regardless of its position x, as stated in Equation (2):

p (s = 1 | x, y = 1) = p (s = 1 | y = 1) = c

(2)

where c is a constant representing the probability of a labeled positive sample. If a binary classifier is trained with a set (x, s) that satisfies Equations (1) and (2), we can obtain a function g(x) = p(s = 1|x). According to Equation (2), we get

\begin{array}{l} g (x) = p (s = 1 | x) = p (1 \land s = 1 | x) \\ \begin{matrix}  \end{matrix} = p (y = 1 | x) p (s = 1 | y = 1, x) \\ \begin{matrix}  \end{matrix} = p (y = 1 | x) p (s = 1 | y = 1) \end{array}

Substituting f(x) = p(y = 1|x) and c = p(s = 1|y = 1), which we defined, we have

f (x) = g (x) / c

(3)

We can estimate c from a validation set, V, which is randomly extracted from the original training set (x, s). Let P represent the subset V, which is labeled (and also positive). We get p(y = 1|x) = 1 and p(y = −1|x) = 0 for all x ∈ P. Therefore,

\begin{array}{l} g (x) = p (s = 1 | x) (x \in P) \\ \begin{matrix}  \end{matrix} = p (s = 1 | x, y = 1) p (y = 1 | x) \\ \begin{matrix}  \end{matrix} + p (s = 1 | x, y = - 1) p (y = - 1 | x) \\ \begin{matrix}  \end{matrix} = p (s = 1 | x, y = 1) \times 1 + 0 \times 0 \\ \begin{matrix}  \end{matrix} = p (s = 1 | x, y = 1) \\ \begin{matrix}  \end{matrix} = c \end{array}

This suggests that any single g(x) from the subset P can be used to estimate c. In real applications, a more reliable estimator of c is the average value of g(x) for all x ∈ P:

c = \frac{1}{n} \sum_{x \in p} g (x)

(4)

where n is the cardinality of P [33].

In summary, the desired classifier f(x) = p(y = 1|x), which predicts the probability of a sample being positive at the pixel level, was obtained in two steps. First, we trained the function g(x) = p(s = 1|x) on only positive and unlabeled samples, which satisfied Equations (1) and (2), and estimated the constant factor c = p (s = 1|y = 1) from Equation (4), using an independent validation set. Secondly, we obtained the desired classifier f (x) = p(y = 1|x) by calibrating g(x) with the constant factor c on the basis of Equation (3). This is the PUL algorithm. More details of the principle and process of its deduction can be found in [33]. It is worth noting that PUL is not a specific classifier but a general framework for classifier learning [22]. In this study, the back propagation (BP) neural network was used as the classifier.

2.3.3. Post-Processing

Although a one-class-based urban map is obtained after the training process, some obvious fractional areas (some bare lands and croplands far from city centers are misclassified as urban areas) are still expected. To remove these areas, we used the DMSP-OLS data to filter the one-class classification results.

DMSP-OLS nighttime images (city lights or stable lights) reflects the presence of human activities [47,48,49] and has been extensively used in urban studies. The common data format is a DN, which ranges from 0 to 63 [39]. A higher value represents a higher mean population density [44,45] and therefore indicates a higher probability of representing a city. However, it is difficult to select suitable thresholds for mapping human settlements on a large scale using DMSP-OLS data alone, because of different levels of socio-economic development [50,51]. In contrast, there are no city lights in land cover such as forests, bare soils, and water bodies, their DMSP-OLS DN values are close to zero, and their mapping thresholds are much easier to determine. DMSP-OLS data can therefore be used to mask out non-settlement land cover [37]. We took advantage of light-free areas to map non-urban areas, and masked out the over-estimated areas far from urban areas (e.g., cropland, bare land, and water bodies) using a superimposing process.

In addition, to reduce the “salt and pepper effect” near city centers, which cannot be removed by the above step, urban blocks smaller than four pixels were also wiped out, according to the minimum area of urban land definition.

2.4. Accuracy Assessment

The accuracy of the obtained classification map was evaluated based on the overall accuracy (OA) and kappa coefficient (K), denoted by

O A = \frac{N_{a}}{N_{r e f}}

K = \frac{\Pr (a) - Pr (e)}{1 - Pr (e)}

where N_a is the number of classified urban points agreeing with the reference urban points, and N_ref is the total number of test samples; Pr(a) is the relative observed agreement, and Pr(e) is the hypothetical probability of chance agreement. It should be noted that although the classification map was predicted using tile units, the test samples used for accuracy assessment were selected from the entire US. In addition to the overall evaluation, another two evaluation indexes, the user’s and producer’s accuracies, were also calculated to evaluate the commission and omission errors, but because urban land areas were the target class, the focus was on values calculated for urban areas.

For city-level-based assessments, we used an independent urban sample set of 3473 cities located in the US to further evaluate city omission cases. To track the trends in omission distribution, the city samples were separated, based on their populations, into five levels: large (>4.2 million), medium (1.5–4.2 million), medium–small (0.5–1.5 million), small (0.1–0.5 million), and very small (<0.1 million). Major cities might be erroneously represented by only a few pixels, therefore we used a slightly stricter definition of an “omitted” city [36], requiring a minimum of 5 km² of urban land for a city to be considered “present”. This helps to eliminate the erroneous instances effectively.

3. Results

The final urban extent map of the US, i.e., the one-class-based urban map, is shown in Figure 5. As can be seen, the urban densities on the west and east coasts of the US are high, and those in the mid-west are low. Only 2.18% of the US is covered by urban areas. Accuracy assessment using randomly selected urban and non-urban samples showed that the accuracy of the obtained urban map is high. The total accuracy for urban detection is 92.91%, and the kappa coefficient is 0.85 (Table 1). The user accuracy for urban areas is higher than 90% (Table 1), indicating that only a very small number of non-urban areas (approximately 3.62%) were misclassified as urban. In terms of producer’s accuracy of urban areas, more than 85% of the reference urban areas were correctly classified.

Figure 5. Urban map covering the US continent, obtained using proposed one-class classification scheme.

Table 1. Accuracy assessment for urban map obtained using PUL one-class classification scheme.

**Table 1.** Accuracy assessment for urban map obtained using PUL one-class classification scheme.
	Reference data				Kappa Coefficient
	Non-urban	Urban	Total	User’s Accuracy (%)	Kappa Coefficient
Classified Data
Non-urban	10,912	1130	12,042	90.61
Urban	288	7670	7958	96.38
Total	11,200	8800	20,000
					0.8546
Producer’s accuracy	97.42%	87.15%		92.91% (Overall map accuracy)

The PUL one-class methodology also successfully mapped all the small-level, medium–small, medium and larger cities, i.e., cities with populations greater than 0.1 million (Table 2). Only very small towns and villages, i.e., with fewer than 0.1 million residents, were omission cases, and the total omission rate was low, i.e., 2.32%. We further analyzed the omission cases of very small cities (with populations less than 0.1 million) from two aspects: population and housing unit number per square kilometer. In terms of population, these cities were divided into five groups to determine the distribution of omission cases. As Table 3 shows, the omissions are mainly for cities with populations between 0.01 and 0.04 million. Sparsely populated areas, which are often characterized as being predominantly rural, have a higher probability of being omitted. The relationship between the omission rate and the housing unit number per square kilometer confirms this. As shown in Figure 6, as the number of housing units per square kilometer decreases, the omission rate increases step by step. In particular, when the housing unit number per square kilometer is lower than 200, the omission rate increases sharply. Artificial settlements cover a relatively small part of the area of a city (<50%), and the spectral features of non-urban areas such as vegetation may dominate in the corresponding pixels, which leads to the urban features of these cities being inconspicuous, causing omission errors.

Table 2. Statistics for omission rates of obtained urban map at city level.

**Table 2.** Statistics for omission rates of obtained urban map at city level.
City Size by Population *	Reference Number	Predicted Number	Omission Rate (%)
Large	1	1	0
Medium	7	7	0
Medium–small	28	28	0
Small	240	240	0
Very small	3197	3115	2.56
Total	3473	3391	2.36

* City size by population (million): large, >4.2; medium, 1.5–4.2; medium–small, 0.5–1.5; small, 0.1–0.5; very small, <0.1.

Table 3. Statistics for omission rates obtained for urban areas corresponding to very small cities (with populations less than 0.1 million).

**Table 3.** Statistics for omission rates obtained for urban areas corresponding to very small cities (with populations less than 0.1 million).
City Size by Population (million)	Reference Number	Predicted Number	Omission Rate (%)
0.08–0.1	118	118	0
0.06–0.08	193	193	0
0.04–0.06	355	352	0.85
0.02–0.04	983	962	2.14
0.01–0.02	1,548	1,490	3.75
Total	3,197	3,115	2.56

Figure 6. Relationship between omission rates for cities and housing unit count per square kilometer.

4. Discussion

4.1. Urban Range Detection

We compared our one-class urban map with three other frequently used urban maps in the U.S., the urban extent extracted from 2010 Land Cover Type Yearly L3 Global 500m SIN Grid (MCD12Q1), the 2011 NLCD urban map, and 2010 US urban vector map. The 2010 MCD12Q1 land cover product was maintained and released by the NASA EOSDIS Land Processes Distributed Active Archive Center. It has a spatial resolution of 500 m, the urban extent extracted from the land cover map has a total accuracy about 93% [3,8]. The 2011 NLCD urban map was created by the by the Multi-Resolution Land Characteristics Consortium, and has a spatial resolution of 30 m. The 2010 US urban vector map was created and released by the US Census Bureau [52].

To make a more detailed comparison, six typical urbanized regions (Los Angeles, San Francisco, Wheat Ridge, St. Louis, Lincoln, and Omaha), ranging from large cities (with a population over 3,500,000) to small cities (with a population under 31,000), were selected for comparing the accuracy of detected urban ranges. The selected cities were analyzed by comparing their spatial patterns in the one-class-based urban map with those in other three urban maps. Figure 7 shows that the NLCD urban map provides the most detailed information about the cities in these four maps since it has the highest-resolution. Despite the lack of details in the exterior space (e.g., roads, suburban district) result from coarse resolution 500-m, the main urban shapes estimated from our method are close to those in the NLCD urban map, which means our results could preserve primary details of cities. Also, the one-class-based urban map has a similar spatial shape to the US urban vector map; except for a few outliers, nearly all the sample cities are close in size to those depicted in the US urban vector map. The MCD12Q1 Urban Extent also has similar shapes in the urban extents with the NLCD urban map and US urban vector map. However, as can be seen, there are more false detections along the urban boundaries compared to our results.

Moreover, although some slight differences still exist in the internal urban areas of those cities, our results most closely approximate the city-scale extent of the built environment in the US urban vector map. US Urban Vector Map only delineate the boundary of an urban area, but may not recognize the non-urban pixels (e.g., urban trees, grass lands and water bodies) within a city boundary. This may be the reason that leading the US Urban Vector Map Classified to be the largest urban area in the four maps. The city extents predicted by the one-class method cover a larger area than those in the MCD12Q1 Urban Extent, and this difference become more significant to medium-small cities. This is possibly caused by that the MCD12Q1 may not correctly recognize those built-up areas largely mixed with vegetation along city edges. Besides, the urban coverages from the one-class classification map and MCD12Q1 Urban Extent is larger than that from the NLCD urban maps in general. This may be caused by the fact that the NLCD urban map has finer spatial resolution, and more 500 m pixels mixed with urban and non-urban can be differentiated instead of being identified as one urban/non-urban pixel.

Figure 7. Comparison of patterns for six selected cities from one-class-based urban map, urban extent extracted from Land Cover Type Yearly L3 Global 500m SIN Grid (MCD12Q1), US urban vector map released by the United States Census Bureau and NLCD urban map created by the Multi-Resolution Land Characteristics Consortium (from left to right). The cities (from top to bottom) are Los Angeles, San Francisco, Wheat Ridge, St. Louis, Omaha and Lincoln.

4.2. Multi-Temporal Data

Previous studies have recommended seasonal images for urban mapping [53,54]. Multi-temporal images might be helpful when using one-class classification since the spectral information about other non-target land covers is limited. To evaluate the influence of multitemporal data on the one-class classification results, a comparative experiment was conducted on Area 2 using only MODIS images from one season, two seasons, and four seasons, respectively (Figure 2). An examination subset was used, consisting of 1000 urban points and 1000 non-urban points randomly selected from the tile. It can be seen (Table 4) that classifying urban land areas using quadruple temporal features produces the highest accuracies for user’s, producer’s, and overall accuracies, and a better kappa coefficient (>0.8), better than those produced using the other two conditions. The compared maps (Figure 8) also show consistency with this result. The number of overestimated blocks of urban areas in the resulting map significantly decreased as the number of temporal images increased, and the well-known salt and pepper effect was gradually reduced. Seasonal vegetation, especially cropland, which is easy to confuse with urban areas, was less-often mislabeled, and the effect of clouds was reduced simultaneously.

Generally, the surface features in a single temporal image only reflect a certain time coverage. Some land covers in different seasons can have different surfaces, e.g., in agricultural land, covers such as grass, crops, and bare soil [47,55]. The spectral changes in different land covers vary through the year, and the spectra may be similar during a certain period (e.g., settlements and harvested farmland). Urban classification using a single temporal image often leads to obvious misclassification. In addition, a single temporal image is susceptible to cloud and rain. The temporal coverage using mono-temporal MODIS data is limited. In contrast, multitemporal images can support increasing amounts of spectral information on surface features with increasing number of temporal images. The spectral contrast between seasonal vegetation in different seasons provided by multitemporal images can easily separate stable urban spectra. Multitemporal images can also be used to lower the impact of cloud contamination [47].

Although the values of the accuracy assessment indicators increased with increasing number of images, this does not mean that the more input images we have, the better the results. A classification experiment based on eight temporal images showed that more temporal images gave little improvement in the overall accuracy (OA = 89.8%, K = 0.796), possibly because of spectral redundancy, and the classification time tripled. We therefore used four temporal images with seasonal features as the original image data.

Table 4. Assessment of multi-temporal MODIS data on mapping urban extents using PUL one-class classification algorithm in Area 2. Influence is evaluated for the user’s accuracy, producer’s accuracy, overall accuracy, and kappa coefficient.

**Table 4.** Assessment of multi-temporal MODIS data on mapping urban extents using PUL one-class classification algorithm in Area 2. Influence is evaluated for the user’s accuracy, producer’s accuracy, overall accuracy, and kappa coefficient.
	User’s Accuracy (%)	Producer’s Accuracy (%)	Overall Accuracy (%)	Kappa Coefficient
Single	97.04	78.82	88.20	0.775
Double	98.08	81.60	90.00	0.800
Quadruple	98.14	84.50	91.45	0.829

Figure 8. Comparison of prediction maps in Area 2 with different temporal features. Location of Area 2 can be seen in Figure 1. (a) Prediction map with single temporal feature 4 July 2010; (b) prediction map with two temporal features, 14 March and 4 July 2010; and (c) prediction map with four temporal features 14 March, 4 July, 6 September and 9 November 2010.

4.3. Map Calibration with DMSP-OLS Data

Figure 9 shows the classification map of Area 1 before and after filtration with DMSP-OLS nighttime data; the US urban vector map released by United States Census Bureau is used as a reference. We can see that compared with the original classification map, the number of misclassified pixels and blocks (e.g., forest, cropland, and bare land far from cities, and water bodies) is greatly reduced in the map after filtering.

Figure 9. Comparison of Area 1 (a) pre-masking map, (b) map masked using DMSP-OLS data, and (c) US urban vector map released by United States Census Bureau. Location of Area 1 in is shown in Figure 1.

Misclassification in coarse-resolution images is usually caused by a mixture of spectral signatures in one pixel [56,57]. Settlements are a complex combination of different impervious surface materials; many different land covers may be mixed in a pixel in sensed data, especially in coarse-resolution images [42,47]. In addition, different land cover types such as artificial surfaces, bare soil, and harvested farmland, may have similar spectral characteristics in coarse-resolution images. These would also cause spectral confusion problems.

DMSP-OLS nighttime images (city lights or stable lights) reflect the existence of human activities which has been applied successfully in studies of mapping urban land [47,48,49] and are not influenced by spectral signals. However, it is obviously inaccurate to map the spatial pattern of settlements with the lighted data directly due to the over-glow effect, especially near large cities. The lighted areas detected by the DMSP-OLS are consistently larger than the geographic extents of the settlements they are associated with [50]. Also, no empirical brightness threshold is widely applicable extracting the lit area precisely matching the actual boundary of urbanized areas owing to various urbanization patterns [37,58]. Although no general rules are available for guiding the selection of threshold values to map the settlements, one thing is clear: high fractional settlements in a pixel generally have high DN values in the DMSP-OLS image; in contrast, no city lights presents in certain non-urban land cover types, e.g., forest, bare soil, and water bodies, and the DMSP-OLS DN values should be close to zero. Because of the peculiarity of the DN value, the DMSP-OLS data could be used to mask out non-settlement land cover [37]. Therefore, the DMSP-OLS nighttime images with threshold (DN < 12), in present study, are mainly used to mask out the misclassified blocks (e.g., bare earth) far from cities rather than to map city extents. The city extents are still obtained only by the proposed PUL urban mapping procedure. The results indicate that the DMSP-OLS data greatly help to remove mislabeled urban blocks far from cities, which may have similar spectral properties to those of urban land. In addition, DMSP-OLS data are also useful for masking out water bodies.

5. Conclusions

This study introduced the use of one-class classifier (PUL) to large-scale urban land mapping with coarse remote-sensing data for the first time. We used the PUL algorithm to map urban extents in the US, using MODIS data with different seasonal variables for 2010, and calibrated the map using DMSP-OLS night light data. The overall accuracy of the one-class-based urban map reached 92.91% (Kappa = 0.85), which shows the effectiveness of the one-class classifier (PUL) in large-scale urban mapping with coarse-resolution remote-sensing data. The use of multi-temporal MODIS imagery can significantly help to separate man-made settlements from vegetation. Besides, although DMSP-OLS stable light data can expand the urban estimation due to the light pollute effect, it can be used to effectively mask out bare land and cropland mosaics far from cities. Compared to the MODIS urban map, our results provide more up-to-date details and timely information for the edges of cities. Our results also have a similar spatial shape to the US urban vector map but recognizing more detailed information (e.g., urban trees, grass lands and water bodies) within a city boundary. Although our result is in lack in details compared with NLCD urban map due to the different spatial resolution, the urban area estimated from our method is very close to the NLCD map, especially for middle and small cities. Our method can significantly reduce the effort needed in assigning labels to training samples, without losing predictive accuracy, compared with traditional supervised classification methods, and shows great potential to map urban extent in global scale. In the future, we will further simplify the procedure for selecting training samples for the proposed one-class classification urban-mapping scheme, and produce a highly accurate global-scale urban map.

Acknowledgements

This study is supported by the National Science Foundation of China (project numbers 41471363 and 31270563). The authors would like to thank Schneider, who provided the MODIS Urban Land Cover 500-m dataset, Shiwu Xu, who provided guidelines on sampling strategy, and Lijian Wang, Yafang Chai, and Caoqun Liu for their help with manual classification.

Author Contritutions

Bo Wan, Qinghua Guo and Yanjun Su conceived and designed the experiments; Bo Wan and Run Wang performed the experiments; Bo Wan and Fang Fang analyzed the data; Bo Wan contributed materials and analysis tools; Bo Wan, Qinghua Guo and Yanjun Su wrote the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Mills, G. Cities as agents of global change. Int. J. Climatol. 2007, 27, 1849–1857. [Google Scholar]
Foley, J.A.; DeFries, R.; Asner, G.P.; Barford, C.; Bonan, G.; Carpenter, S.R.; Chapin, F.S.; Coe, M.T.; Daily, G.C.; Gibbs, H.K.; et al. Global consequences of land use. Science 2005, 309, 570–574. [Google Scholar]
Friedl, M.A.; Sulla-Menashe, D.; Tan, B.; Schneider, A.; Ramankutty, N.; Sibley, A.; Huang, X. MODIS Collection 5 global land cover: Algorithm refinements and characterization of new datasets. Remote Sens. Environ. 2010, 114, 168–182. [Google Scholar]
Grimm, N.B.; Faeth, S.H.; Golubiewski, N.E.; Redman, C.L.; Wu, J.G.; Bai, X.M.; Briggs, J.M. Global change and the ecology of cities. Science 2008, 319, 756–760. [Google Scholar]
Wu, J.G. Urban ecology and sustainability: The state-of-the-science and future directions. Landsc. Urban Plan. 2014, 125, 209–221. [Google Scholar]
Cai, S.S.; Liu, D.S.; Sulla-Menashe, D.; Friedl, M.A. Enhancing MODIS land cover product with a spatial-temporal modeling algorithm. Remote Sens. Environ. 2014, 147, 243–255. [Google Scholar]
Gong, P.; Wang, J.; Yu, L.; Zhao, Y.C.; Zhao, Y.Y.; Liang, L.; Niu, Z.; Huang, X.; Fu, H.; Liu, S.; et al. Finer resolution observation and monitoring of global land cover: First mapping results with Landsat TM and ETM+ data. Int. J. Remote Sens. 2013, 34, 2607–2654. [Google Scholar]
Schneider, A.; Friedl, M.A.; Potere, D. Mapping global urban areas using MODIS 500-m data: New methods and datasets based on “urban ecoregions”. Remote Sens. Environ. 2010, 114, 1733–1746. [Google Scholar]
Loveland, T.R.; Reed, B.C.; Brown, J.F.; Ohlen, D.O.; Zhu, Z.; Yang, L.; Merchant, J.W. Development of a global land cover characteristics database and IGBP DISCover from 1 km AVHRR data. Int. J. Remote Sens. 2000, 21, 1303–1330. [Google Scholar]
Bartholome, E.; Belward, A.S. GLC2000: A new approach to global land cover mapping from Earth observation data. Int. J. Remote Sens. 2005, 26, 1959–1977. [Google Scholar]
Arino, O.; Gross, D.; Ranera, F.; Leroy, M.; Bicheron, P.; Brockman, C.; Latham, J.; di Gregorio, A.; Brockman, C.; Witt, R.; et al. GlobCover: ESA service for global land cover from MERIS. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2007, Barcelona, Spain, 23–28 July 2007.
Bontemps, S.; Defourny, P.; Bogaert, E.V.; Arino, O.; Kalogirou, V.; Rerez, J.E. GLOBCOVER 2009—Products Description and Validation Reports. 2011. Available online: http://ionia1.esrin.esa.int/docs/GLOBCOVER2009_Validation_Report_2.2.pdf (accessed on 14 May 2015).
CIESIN, Center for International Earth Science Information Network. Global Rural-Urban Mapping Project (GRUMP), Alpha Version: Urban Extents. 2004. Available online: http://sedac.ciesin.columbia.edu/gpw (accessed on 1 August 2009).
Schneider, A.; Friedl, M.A.; Potere, D. A new map of global urban extent from MODIS satellite data. Environ. Res. Lett. 2009, 4. [Google Scholar] [CrossRef]
Schneider, A.; Friedl, M.A.; Mciver, D.K.; Woodcock, C.E. Mapping urban areas by fusing multiple sources of coarse resolution remotely sensed data. Photogramm. Eng. Remote Sens. 2003, 69, 1377–1386. [Google Scholar]
Yu, L.; Wang, J.; Li, X.C.; Li, C.C.; Zhao, Y.Y.; Gong, P. A multi-resolution global land cover dataset through multisource data aggregation. Sci. China: Earth Sci. 2014. [Google Scholar] [CrossRef]
Vogelmann, J.E.; Howard, S.M.; Yang, L.; Larson, C.R.; Wylie, B.K.; van Driel, J.N. Completion of the 1990’s national land cover data set for the conterminous United States. Photogramm. Eng. Remote Sens. 2001, 67, 650–662. [Google Scholar]
Fry, J.; Xian, G.; Jin, S.; Dewitz, J.; Homer, C.; Yang, L.; Barnes, C.; Herold, N.; Wickham, J. Completion of the 2006 national land cover database for the conterminous United States. Photogramm. Eng. Remote Sens. 2011, 77, 858–864. [Google Scholar]
Homer, C.; Dewitz, J.; Fry, J.; Coan, M.; Hossain, N.; Larson, C.; Herold, N.; McKerrow, A.; VanDriel, J.N.; Wickham, J. Completion of the 2001 national land cover database for the conterminous United States. Photogramm. Eng. Remote Sens. 2007, 73, 337–341. [Google Scholar]
Homer, C.G.; Dewitz, J.A.; Yang, L.; Jin, S.; Danielson, P.; Xian, G.; Coulston, J.; Herold, N.D.; Wickham, J.D.; Megown, K. Completion of the 2011 national land cover database for the conterminous United States-Representing a decade of land cover change information. Photogramm. Eng. Remote Sens. 2015, 81, 345–354. [Google Scholar]
Munoz-Mari, J.; Bruzzone, L.; Camps-Valls, G. A support vector domain description approach to supervised classification of remote sensing images. IEEE Trans. Geosci. Remote Sens. 2007, 45, 2683–2692. [Google Scholar]
Guo, Q.H.; Li, W.K.; Liu, D.S.; Chen, J. A framework for supervised image classification with incomplete training samples. Photogramm. Eng. Remote Sens. 2012, 78, 595–604. [Google Scholar]
Lu, D.; Weng, Q. Extraction of urban impervious surfaces from an IKONOS image. Int. J. Remote Sens. 2009, 30, 1297–1311. [Google Scholar]
Foody, G.M.; Mathur, A.; Sanchez-Hernandez, C.; Boyd, D.S. Training set size requirements for the classification of a specific class. Remote Sens. Environ. 2006, 104, 1–14. [Google Scholar]
Guo, Q.; Li, W.; Liu, Y.; Tong, D. Predicting potential distributions of geographic events using one-class data: Concepts and methods. Int. J. Geogr. Inf. Sci. 2011, 25, 1697–1715. [Google Scholar]
Jeon, B.; Landgrebe, D.A. Partially supervised classification using weighted unsupervised clustering. IEEE Trans. Geosci. Remote Sens. 1999, 37, 1073–1079. [Google Scholar]
Scholkopf, B.; Platt, J.C.; Shawe-Taylor, J.; Smola, A.J.; Williamson, R.C. Estimating the support of a high-dimensional distribution. Neural Comput. 2001, 13, 1443–1471. [Google Scholar]
Li, P.; Xu, H.; Li, S. Urban impervious surface extraction from very high resolution imagery by one-class support vector machine. In Proceedings of the 100 Years ISPRS Advancing Remote Sensing Science, Vienna, Austria, 5–7 July 2010; pp. 366–370.
Li, P.J.; Xu, H.Q. Land-cover change detection using one-class support vector machine. Photogramm. Eng. Remote Sens. 2010, 76, 255–263. [Google Scholar]
Sanchez-Hernandez, C.; Boyd, D.S.; Foody, G.M. Mapping specific habitats from remotely sensed imagery: Support vector machine and support vector data description based classification of coastal saltmarsh habitats. Ecol. Inform. 2007, 2, 83–88. [Google Scholar]
Sanchez-Hernandez, C.; Boyd, D.S.; Foody, G.M. One-class classification for mapping a specific land-cover class: SVDD classification of fenland. IEEE Trans. Geosci. Remote Sens. 2007, 45, 1061–1073. [Google Scholar]
Li, W.K.; Guo, Q.H.; Elkan, C. A positive and unlabeled learning algorithm for one-class classification of remote-sensing data. IEEE Trans. Geosci. Remote Sens. 2011, 49, 717–725. [Google Scholar]
Elkan, C.; Noto, K. Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008. [CrossRef]
Arnold, C.L.; Gibbons, C.J. Impervious surface coverage: The emergence of a key environmental indicator. J. Am. Plan. Assoc. 1996, 62, 243–258. [Google Scholar]
Cohen, B. Urbanization in developing countries: Current trends, future projections, and key challenges for sustainability. Technol. Soc. 2006, 28, 63–80. [Google Scholar]
Duranton, G.; Puga, D. From sectoral to functional urban specialisation. J. Urban Econ. 2005, 57, 343–370. [Google Scholar]
Lu, D.S.; Tian, H.Q.; Zhou, G.M.; Ge, H.L. Regional mapping of human settlements in southeastern China with multisensor remotely sensed data. Remote Sens. Environ. 2008, 112, 3668–3679. [Google Scholar]
McIntyre, N.E. Urban ecology: Definitions and goals. In The Routledge Handbook on Urban Ecology; Douglas, I., Goode, D., Houck, M., Wang, R., Eds.; Routledge: Abingdon, UK, 2011. [Google Scholar] [CrossRef]
Potere, D.; Schneider, A. A critical look at representations of urban areas in global maps. GeoJournal 2007, 69, 55–80. [Google Scholar]
Potere, D.; Schneider, A.; Angel, S.; Civco, D.L. Mapping urban areas on a global scale: Which of the eight maps now available is more accurate? Int. J. Remote Sens. 2009, 30, 6531–6558. [Google Scholar]
Small, C. A global analysis of urban reflectance. Int. J. Remote Sens. 2005, 26, 661–681. [Google Scholar]
Weng, Q.H. Remote sensing of impervious surfaces in the urban areas: Requirements, methods, and trends. Remote Sens. Environ. 2012, 117, 34–49. [Google Scholar]
Elvidge, C.D.; Baugh, K.E.; Dietz, J.B.; Bland, T.; Sutton, P.C.; Kroehl, H.W. Radiance calibration of DMSP-OLS low-light imaging data of human settlements. Remote Sens. Environ. 1999, 68, 77–88. [Google Scholar]
Sutton, P.; Elvidge, C.; Obremski, T. Building and evaluating models to estimate ambient population density. Photogramm. Eng. Remote Sens. 2003, 69, 545–554. [Google Scholar]
Zhuo, L.; Ichinose, T.; Zheng, J.; Chen, J.; Shi, P.; Li, X. Modelling the population density of China at the pixel level based on DMSP/OLS non-radiance-calibrated night-time light images. Int. J. Remote Sens. 2009, 30, 1003–1018. [Google Scholar]
National Atlas of the United States. Available online: http://nationalatlas.gov/atlasftp-1m.html (accessed on 10 August 2014).
Elvidge, C.D.; Baugh, K.E.; Hobson, V.R.; Kihn, E.A.; Kroehl, H.W.; Davis, E.R.; Cocero, D. Satellite inventory of human settlements using nocturnal radiation emissions: A contribution for the global toolchest. Glob. Chang. Biol. 1997, 3, 387–395. [Google Scholar]
Gallo, K.P.; Elvidge, C.D.; Yang, L.; Reed, B.C. Trends in night-time city lights and vegetation indices associated with urbanization within the conterminous USA. Int. J. Remote Sens. 2004, 25, 2003–2007. [Google Scholar]
Imhoff, M.L.; Lawrence, W.T.; Stutzer, D.C.; Elvidge, C.D. A technique for using composite DMSP/OLS “City Lights” satellite data to map urban area. Remote Sens. Environ. 1997, 61, 361–370. [Google Scholar]
Small, C.; Pozzi, F.; Elvidge, C.D. Spatial analysis of global urban extent from DMSP-OLS night lights. Remote Sens. Environ. 2005, 96, 277–291. [Google Scholar] [CrossRef]
Elvidge, C.D.; Cinzano, P.; Pettit, D.R.; Arvesen, J.; Sutton, P.; Small, C.; Nemani, R.; Longcore, T.; Safran, J.; Ebener, S. The NightSat mission concept. Int. J. Remote Sens. 2007, 28, 3645–2670. [Google Scholar]
US Census Bureau. Available online: http://www2.census.gov/geo/tiger/TIGER2010/UA/2010/ (accessed on 24 August 2014).
Sung, C.Y.; Li, M.-H. Considering plant phenology for improving the accuracy of urban impervious surface mapping in a subtropical climate regions. Int. J. Remote Sens. 2012, 33, 261–275. [Google Scholar]
Wu, C.; Yuan, F. Seasonal sensitivity analysis of impervious surface estimation with satellite imagery. Photogramm. Eng. Remote Sens. 2007, 73, 1393–1401. [Google Scholar]
Galford, G.L.; Mustard, J.F.; Melillo, J.; Gendrin, A.; Cerri, C.C.; Cerri, C.E.P. Wavelet analysis of MODIS time series to detect expansion and intensification of row-crop agriculture in Brazil. Remote Sens. Environ. 2008, 112, 576–587. [Google Scholar]
Fisher, P. The pixel: A snare and a delusion. Int. J. Remote Sens. 1997, 18, 679–685. [Google Scholar]
Cracknell, A.P. Synergy in remote sensing – What’s in a pixel? Int. J. Remote Sens. 1998, 19, 2025–2047. [Google Scholar]
Ma, T.; Zhou, Y.; Zhou, C.; Haynie, S.; Pei, T.; Xu, T. Night-time light derived estimation of spatio-temporal characteristics of urbanization dynamics using DMSP/OLS satellite data. Remote Sens. Environ. 2015, 158, 453–464. [Google Scholar] [CrossRef]

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wan, B.; Guo, Q.; Fang, F.; Su, Y.; Wang, R. Mapping US Urban Extents from MODIS Data Using One-Class Classification Method. Remote Sens. 2015, 7, 10143-10163. https://doi.org/10.3390/rs70810143

AMA Style

Wan B, Guo Q, Fang F, Su Y, Wang R. Mapping US Urban Extents from MODIS Data Using One-Class Classification Method. Remote Sensing. 2015; 7(8):10143-10163. https://doi.org/10.3390/rs70810143

Chicago/Turabian Style

Wan, Bo, Qinghua Guo, Fang Fang, Yanjun Su, and Run Wang. 2015. "Mapping US Urban Extents from MODIS Data Using One-Class Classification Method" Remote Sensing 7, no. 8: 10143-10163. https://doi.org/10.3390/rs70810143

Article Menu

Mapping US Urban Extents from MODIS Data Using One-Class Classification Method

Abstract

1. Introduction

2. Data and Method

2.1. Urban Extent

2.2. Dataset

2.3. Method

2.3.1. Sampling

2.3.2. PUL

2.3.3. Post-Processing

2.4. Accuracy Assessment

3. Results

4. Discussion

4.1. Urban Range Detection

4.2. Multi-Temporal Data

4.3. Map Calibration with DMSP-OLS Data

5. Conclusions

Acknowledgements

Author Contritutions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI