Spatial Interpolation of Population Change (%) Across Canada
Geofficient Strategies Analysis and Formal Report 21 March 2014
1.0 Introduction The population of Canada has been steadily increasing since Confederation in 1867. In 2013, the population finally reached the 35 million mark 1 . With population trends come questions of why and where. The goal of this study is to identify areas of population change and to explore the possibilities of why it is increasing or decreasing in certain areas. A geo-statistical analysis will be undertaken for all of Canada with the help of 147 cities and their population trend from 2006 to 2011. Based on the cities with available data, the population change of surrounding areas will be predicted to determine nation-wide trends.
The essential part of this study involves performing spatial interpolation using two methods: Kriging and Inverse Distance Weighting (IDW). Spatial interpolation is the process of predicting values of a variable for unknown points. The 147 cities that have known population change values will be used to aid in the prediction of parts of Canada that have unknown population change values. The unknown values are predicted based on various parameters associated with each method and are primarily influenced by the surrounding areas of known values. These methods will be further explored throughout the report. 1.1 Study Area Canada is the study area but is bounded by the extent of populated cities. The northern boundary for this study lies at 63N latitude where Yellowknife, Northwest Territories is located while the southern boundary is at 42N. Since there is not a sufficient population north of 63N, this area has been omitted. All cities range fall in the range of 136W to 52W longitude as seen in (Figure 1).
1 Canadian Population Surpasses 35 Million, CBC News, September 26 2013 Spatial Interpolation of Population Change (%) Across Canada Geofficient Strategies Analysis and Formal Report 21 March 2014 1
Figure 1: Study Area and Cities Spatial Interpolation of Population Change (%) Across Canada Geofficient Strategies Analysis and Formal Report 21 March 2014 2
1.2 Data Analysis Population statistics were obtained from StatsCan 2 by obtaining the top 147 municipalities with populations over 5 000. Population change percentage from 2006 to 2011 is the main variable of the study with positive values indicating an increase and negative values indicating population decrease. Locational data for the 147 cities was obtained from Geocoder.ca. Latitude and Longitude were used as the Y and X coordinates. 3 Exploratory data analysis was undertaken to identify characteristics of the data such as mean, variance, directional influences, and any trends in the data.
The population change data is almost normally distributed but positively skewed (Figure 2).
Figure 2: Population Change % Histogram This indicates that most of the data lies in lower values around the median. The median is 4.1% compared to 5.0%, supporting the positively skewed dataset. Standard deviation is 6.3% which is the average difference of a population change value from the mean. It is expected for normally distributed data that 99% of the values should fall within +/-3 Standard deviations from the mean (Smith), which is between -13.9% to 23.9%. There are two data points that are outside of this range: Wood Buffalo 27.1% and Okotoks at 42.9%. The range of the entire data set is -4.5% to 42.9% but there is only the one value above 28%. Okotoks can be considered an outlier since its value is unlike or near any others. Upon further investigation, this city is in
2 Statistics Canada, 2011 Census Report 3 Obtained from Geocoder.ca Spatial Interpolation of Population Change (%) Across Canada Geofficient Strategies Analysis and Formal Report 21 March 2014 3
Alberta which is experiencing a provincial wide population increase 4 . This point will be kept in the study as it is indicative of an area where a population boom is occurring. The high kurtosis value of 11.9 indicates peakedness and a long tail 5 which is visible on the histogram. The Q-Q plot compares the distribution of observed points to a theoretical normal distribution (Figure 3).
Figure 3: Normal Q-Q Plot of Population Change The data around the median value is closely related to the theoretical normal. Both ends of the distribution are where values start to differ most. This is consistent with skewed data and data with a large heavy tail 6 . Despite this, there are no significant issues with the distribution of the population change dataset.
The spatial distribution of cities is not normal (Figure 4).
A
4 Statistics Canada, Annual Demographic Estimates, 2013 5 David Lane , Introduction To Statistics 6 David Lane, Introduction To Statistics Spatial Interpolation of Population Change (%) Across Canada Geofficient Strategies Analysis and Formal Report 21 March 2014 4
B Figure 4: Histograms- A-Latitude, B- Longitude Latitude is positively skewed with many cities residing at lower latitudes while longitude has a multi-modal distribution. Additionally, by looking at the map of the study area, it is clear that the cities are not evenly distributed throughout Canada. Cities were chosen based on available data and were kept to a standard of having a population at least above 5000. The size and sparseness of Canada makes leads to difficulty in obtaining spatially normal sample locations. This concept will be further explored in the Discussion section 4.0.
A trend analysis between all three variables: Longitude (X),Latitude (Y), and Population Change (Z) can be seen in (Figure 5).
Figure 5: Trend Analysis For X, Y and Z Spatial Interpolation of Population Change (%) Across Canada Geofficient Strategies Analysis and Formal Report 21 March 2014 5
This graph shows the trends between XY, YZ, and XZ. The trend line for XZ, Longitude vs Population Change shows a minimal trend at the 2 nd order polynomial level. At this level, YZ, Latitude vs Population Change shows a stronger trend due to the concentration of lower latitude cities. These small second order trends can be eliminated during the Kriging process.
A semi-variogram describes the spatial relationship between data points 7 . In this study, it compares the distance between every pair of cities versus their difference in population change value and plots the result on a graph (Figure 6).
Figure 6: Semi-Variogram For Population Change This semi-variogram has a mostly horizontal form. This means that there is minimal spatial autocorrelation at a Canada-wide scale 7 . This can be supported by the fact that the population in Calgary has no effect on the population of Toronto. Since the sampling distance greatly varies from coast to coast of Canada, the semi-variogram should ideally be adjusted for each neighbourhood during spatial interpolation. In summary, the data for population change is relatively normal with a positive skew indicating a large portion of values around the median of 4.1%. One data point, Okotoks, AB, with a unique 42.9% population change, will be kept as this area has been experiencing high population increase compared with the whole of Canada. It has been determined that there is
7 Gregg Babish, Geostats Without Tears Spatial Interpolation of Population Change (%) Across Canada Geofficient Strategies Analysis and Formal Report 21 March 2014 6
little to no spatial autocorrelation with cities that are on opposite sides of the country. This will be kept in mind when setting neighbourhoods for spatial interpolation. 2.0 Methodology The two methods used for spatial interpolation were IDW and Kriging. These processes were run on ArcGIS 10.1 using the Geostatistical Wizard. All procedures and map features were set with the coordinate system of GCS North American 1927. The information regarding Data Analysis was used to create the best possible spatial interpolation models.
2.1 Inverse Distance Weighting Inverse Distance Weighting is an effective way to take an initial look at a dataset. It has limited input parameters and does not make any assumptions about the data 8 . This method predicts unknown points by assigning weights to the surrounding known data points. When an unknown point is interpolated, the closest data points have higher influences then those at greater distances. Therefore, proximity has the greatest effect on unknown points in IDW. In ArcGIS, this is determined by the Power function. The following parameters were used to produce the IDW result:
Figure 7: IDW Parameters
8 ArcGIS 10.1 Geostatistical Wizard These parameters were tested to obtain the lowest Root Mean Square. The power function was set at 1 to give higher weights to closer cities with diminishing weights given to cities farther from the unknown point. The minimum number of neighbours to include was set to 5 with a maximum of 15 in order for the prediction process to use a limited neighbourhood. This was to limit the amount of influence cities further away had on the prediction.
Spatial Interpolation of Population Change (%) Across Canada Geofficient Strategies Analysis and Formal Report 21 March 2014 7
2.2 Kriging Kriging is a much more complex model for spatial interpolation. It has many more input parameters and provides an interactive process for adjusting these parameters in ArcGIS. This method relies on statistical relationships between the measured points 9 . It uses the semi- variogram and weights obtained from it in the interpolation process 10 . Simple Kriging was used since it is the method that accepts a dataset containing negative values. The following parameters were used:
Figure 8: Parameters for Kriging
9 ArcGIS 10.1 Help Files 10 Gregg Babish, Geostatistics Without Tears A normal score transformation was used to normalize the data and make variances more consistent throughout the study area 11 . This was the only available option with the use of negative data. Declustering was undertaken to reduce the effect of preferential sampling and to correct the data distribution estimate 1 . Second order trends were removed as per the Trend Analysis in Section 1.2. The Power function was increased to 2 in this method to give higher weights to those cities closer to the prediction point. The reasoning behind this is that closer cities should have more of an influence than cities further away. The Searching Neighbourhood used similar parameters as the IDW method. Five neighbours were set as a minimum since some areas were isolated from the bulk of the cities. Fifteen was set at a maximum in order to only include nearby cities. By setting limits of the number of influencing cities, it reduced the influence of cities further away that most likely have no correlation with population change. The number of lags and lag size was set to include all points on the semi-variogram. Other settings were kept as the defaults by ArcGIS.
Spatial Interpolation of Population Change (%) Across Canada Geofficient Strategies Analysis and Formal Report 21 March 2014 8
3.0 Results The IDW result shows an effective initial display of population change across Canada (Figure 9, next page). The map shows that Eastern Canada is experiencing a population decrease although certain areas are experiencing an increase. The London to Montreal corridor is producing low population increases. Southern parts of New Brunswick and Nova Scotia are also appear to be attract areas for population increase. The most noticeable aspect of the map is the higher population increases in Western Canada. The highest population increase is occurring in Alberta and Western Saskatchewan. Current trends support this analysis as Alberta has the highest growth rate of any province 11 . There are small areas that have higher population increases in Alberta which play a role in making the whole province an attractive living area. Northern Canada, north of Yellowknife, NWT, show varying trends for population change. Since there was no data in this area, these results cannot be reliable. The influence of western Canada seemed to create positive trends in the north. This area was added for completion purposes.
11 James Wood, Calgary Herald, Albertas Population Cracks 4 Million Spatial Interpolation of Population Change (%) Across Canada Geofficient Strategies Analysis and Formal Report 21 March 2014 9
Figure 9: IDW Prediction Surface For Population Change In Canada Spatial Interpolation of Population Change (%) Across Canada Geofficient Strategies Analysis and Formal Report 21 March 2014 10
The Kriging result shows a more organized spatial interpolation result (Figure 10, next page). It shows similar trends to the IDW result but differ visually. Eastern Canada is heavily coded with blue indicating populations are decreasing. The Golden Horseshoe is experiencing population increases while an area just east of Toronto seems to be decreasing. The decreasing trend in central Ontario seems to have affected this area east of Toronto. The southern tip on Ontario is also experiencing a population decrease. Western Canada is experiencing the highest population increases again. Alberta has a similar pattern to IDW with a high population increase. Two distinct areas in British Columbia appear to be experiencing a population decrease that was not apparent in IDW. The Yukon Territory has the highest population increase as the dark red shading shows. There were not many data points in this area so the prediction is most likely inaccurate. Northern Canada including the NWT, Nunavut and Northern Quebec appear to be experiencing population decreases. Since no data points were north of 63N latitude, these results cannot be too accurate. After further research, Nunavut has been experiencing population increases so this data can be disregarded 12 .
12 Statistics Canada, 2006 Aboriginal Census Profiles Spatial Interpolation of Population Change (%) Across Canada Geofficient Strategies Analysis and Formal Report 21 March 2014 11
Figure 10: Kriging Prediction Surface For Population Change In Canada Spatial Interpolation of Population Change (%) Across Canada Geofficient Strategies Analysis and Formal Report 21 March 2014 12
4.0 Discussion This section will discuss the validity of both methods and results. IDW is an effective method for taking a preliminary look at a dataset. Since there are not many inputs, it greatly limits its ability to predict surfaces with complex datasets. The predicted surface for IDW did a fair job in predicting the real life surface. The graphs in Figure 11 show how well the model predicted population change percentage.
Figure 11: IDW Cross Validation Results Both graphs show that the predicted points vary along the trend line. They are somewhat close to the line indicating an effective model. There are minimal outliers in the graphs. The model was tested by adjusting the parameters such as neighbourhood, sector type, and power to attempt to get the lowest possible Root Mean Square (RMS). The RMS value is 5.25 which was the minimum value after testing the various parameters. Factors such as widely dispersed sampling, clustering, and locational correlation negatively affected the prediction surface.
Kriging uses many input parameters and even more for advanced users. This can be a drawback for first-time users. It can be a very effective for users familiar with this method. The Kriging result produced for this study had the following validation results: Spatial Interpolation of Population Change (%) Across Canada Geofficient Strategies Analysis and Formal Report 21 March 2014 13
Figure 12: Kriging Cross Validation Results
The Predicted graph (left) shows a large cluster of points in the center of the trend line. This indicates that the prediction process was fairly successful. There are a few outliers which had much lower measured values than what was predicted for them. The Error graph (right) shows a similar but negative trend. The clustering indicates that values had similar error values. The average standard error value was quite high at about 4.50 %. Considering the original data was heavily centered around the median, an error of 4.50% could skew the effectiveness of this predicted surface.
Overall, Kriging provided a slightly better result than IDW. The trend of a population shift towards Western Canada is clear and distinct in the Kriging result. The biggest challenge when performing spatial interpolation was the data set. Having negative values greatly reduced the ability to choose between Kriging types and transformation methods since they did not work with negative values. Furthermore, the distribution of the dataset all across Canada may have been an impediment to the success of the model. Population change of cities on the east coast would most likely have little effect on cities of the west. This was acknowledged and partially mitigated by setting Search Neighbourhood values to include only those nearby cities. An Spatial Interpolation of Population Change (%) Across Canada Geofficient Strategies Analysis and Formal Report 21 March 2014 14
improvement to this study could have focused on an individual province. Cities could then be chosen to represent all areas of the province. Cities in Canada are in no way normally distributed. Most of the population lies in the southern portion of the country with the north being very sparsely populated. 5.0 Conclusion Kriging is an advanced interpolation method that can outmatch IDW for data sets with normality issues, directional influences, and global trends. IDW provides quick and easy interpolation method for basic analysis. These methods both aided in the analysis of population change in Canada. The results of this study show that the Canadian population is increasing at the highest rates in Western Canada. Both models show that Alberta is the province with the highest population growth rates. The data north of the 63 latitude is not to be held to reliable standards as it was lacking measured data points. Also, since the population is not spread out among these northern areas, it is not necessary to even create a predicted surface for it. It was created for this study merely for continuity purposes. The predicted surfaces also show that Eastern Canada has lower population growth rates and even negative growth rates. The exceptions include: the Golden Horseshoe, the Ottawa region, and southern parts of New Brunswick and Nova Scotia. Still, none of these eastern regions have similar growth rates to the west.
Spatial Interpolation of Population Change (%) Across Canada Geofficient Strategies Analysis and Formal Report 21 March 2014 15
Bibliography
Babish, G. (2000). Geostatistics Without Tears. Saskatchewan. CBC News. (2013, September 26). Canadian population surpasses 35 million. Retrieved Feb 24, 2014, from CBC: http://www.cbc.ca/news/canadian-population-surpasses-35-million-1.1869011 ESRI. (2013). Geostatistical Wizard. ArcGIS 10.1. Lane, D. (n.d.). Chapter 8 Advanced Graphs. Retrieved 2014, from Online Statistics Education: An Interactive Multimedia Course of Study. Smith, I. (2014). Deliverable 4. Geostatistical Analysis of Student Obtained SPatial data. NOTL, ON, Canada. Smith, I. (2014). GISC 9308- Introduction To Statistics. Niagara College. Statistics Canada. (2013, 06 19). Annual Demographic Estimates. Retrieved 2014, from Statistics Canada: http://www.statcan.gc.ca/pub/91-215-x/2012000/part-partie1-eng.htm Statistics Canada. (2014, January 13). Population and Dwelling Counts. Retrieved January 31, 2014, from Statistics Canada: http://www12.statcan.gc.ca/census-recensement/2011/dp-pd/hlt-fst/pd- pl/Table-Tableau.cfm?LANG=Eng&T=307&S=11&O=A&RPP=699 Wood, J. (2013, 09 26). Alberta Population Cracks 4 Million. Retrieved 2014, from Calgary Herald.