Sei sulla pagina 1di 17

Statistics

Report
China, Minority Regions and Illiteracy

1
Contents
Abstract ................................................................................................................................................... 3
Introduction............................................................................................................................................. 3
Previous work .......................................................................................................................................... 4
Theoretical background........................................................................................................................... 4
Data ......................................................................................................................................................... 5
Descriptive statistics ............................................................................................................................ 7
Inferential statistics ........................................................................................................................... 13
Correlation Analysis ....................................................................................................................... 13
Linear regression ........................................................................................................................... 13
Discussion ...................................................................................................................................... 15
Limitations of the model and further study .................................................................................. 17
Conclusion ............................................................................................................................................. 17
References ............................................................................................................................................. 17

2
Abstract
This study examines the factors affecting education and regional inequality in China. It hypothesizes
that government spending on education is not the only key determinant of the literacy rate within
China, but rather, development and integration also are.

Introduction
China is a country of many nationalities, including the predominant Han and more than 55 ethnic
minorities. In addition to this, China consists of 34 provincial-level administrative divisions, with large
development inequality between the east and the west regions. For this reason, government policy
makers are highly concerned with choosing such policy, as to integrate those regions into the core,
both culturally and economically. Language and literacy rate are key indicators for integration. This
paper aims to propose a model for estimating illiteracy rate by analyzing a cross-section of all
Chinese provinces. It examines how different factors are related to the illiteracy rate in those regions.
To illustrate the illiteracy rate problem China is facing, we can examine the following map of the
Chinese provinces and their corresponding illiteracy rates.

Figure 1. Minority Regions in China and Illiteracy Rate

3
By drawing conclusions about what causes illiteracy rate, this study can serve Chinese policy makers
to choose the best policy for integration.

Previous work
In a study on the Spatial inequality in education and healthcare in China, Xiaobo Zhang and Ravi
Kanbur conclude a well-educated labor force is a key asset to China. According to them the
regional gap in education in China is increasing and it demands government action. Furthermore, the
increased economic integration between the regions has weakened the governments ability to
redistribute wealth. Thus, governments cannot mobilize vast manpower in public works as they did in
the planned era, because labor must be adequately compensated in the market economy.

In another study by REGIE STITES, titled Limits to the effectiveness of the 1990 anti-illiteracy
campaign in promoting literacy among women in a rural Chinese community, the author concludes
that state-sponsored anti-illiteracy campaigns targeted at rural and remote communities are likely to
become increasingly ineffective and encounter increasing resistance.

These two studies both conclude that government spending on education is not always the most
effective way of coping with illiteracy. Therefore this papers purpose of examining what other
factors affect illiteracy is justified.

Theoretical background
Maslows hierarchy of needs pyramid model, classifies peoples needs into four categories, arranged
in a bottom to top hierarchical manner. According to Maslow, if a person has not managed to fulfil
his basic needs (situated in the bottom of the pyramid), he will not pursue higher needs. Self-
actualization (or the need for education) is situated at the top of the pyramid. In other words, it will
only be pursued by people who have already met all their basic needs. Therefore this theoretical
model implies that education is related to quality of life. Higher quality of life and lack of poverty is
essential for fighting illiteracy.

Figure 2. Maslows hierarchy of needs pyramid

4
Data
Cross-sectional data concerning the selected variables has been presented in the following table. The
data has been provided by the National Bureau of Statistics of China.

The numbers in the first column (%Illiterate) represent the illiteracy rate by provinces, recorded in
China in 2014. This data will represent our dependent variable. The rest of the variables represent
data for year 2004, or in other words a lag of 10 years has been used, since education is a long
process and is affected by factors in the past.

The second column (%Minority) states the % of total population in each province, who are minority
population. Due to the lack of data on the ethnic minority population in Non-minority areas, it has
been assumed it is approximately 0.

The third and fourth column (St/T_Prim and St/T_Mid) represent the Student to Teacher ratio in
Primary and in Middle schools in each province respectively. Lower ratio is better and means fewer
students per teacher.

The fifth column (EdFndCap) represents Government Education Funds per Capita. The more the
government spends, the better the education should be.

The sixth and seventh columns are Consumption per capita and GDP per capita and represent the
quality of life in each province. Higher numbers mean higher quality of life.

The eighth and ninth columns are %Water and %Gas and represent what portion of the population
has access to Water and Gas respectively, and represent the level of development in each region.

5
Region %Illiterate %Minority St/T_Prim St/T_Mid EdFndCap Cnsmpt GDP/cap %Water %Gas
Beijing 1,52 0,00 10,58 11,91 1096,86 12200.40 37058,00 100,00 100,00
Tianjin 2,06 0,00 13,17 14,16 410,46 8802.44 31550,00 100,00 100,00
Hebei 3,12 1,69 16,84 18,12 184,11 5819.18 12918,00 99,85 98,35
Shanxi 2,09 0,00 18,79 16,55 204,92 5654.15 9150,00 98,14 96,10
Inner Mongolia 4,27 21,78 13,34 15,99 254,73 6219.26 11305,00 96,23 87,93
Liaoning 1,79 4,04 16,71 15,38 294,59 6543.28 16297,00 98,77 96,15
Jilin 2,27 4,20 12,36 16,46 285,99 6068.99 10932,00 93,84 91,43
Heilongjiang 2,18 0,14 13,21 16,01 258,70 5567.53 13897,00 95,46 85,58
Shanghai 3,64 0,00 14,25 15,34 699,85 12631.03 55307,00 100,00 100,00
Jiangsu 3,78 0,00 19,94 19,41 286,06 7332.26 20705,00 99,69 99,59
Zhejiang 5,38 0,04 21,51 16,43 341,05 10636.14 23942,00 99,97 99,80
Anhui 7,43 0,00 23,76 24,57 179,31 5711.33 7768,00 98,40 96,14
Fujian 5,06 0,00 16,78 18,72 308,44 8161.15 17218,00 99,42 98,85
Jiangxi 2,75 0,00 19,87 18,70 151,66 5337.84 8189,00 97,73 95,10
Shandong 5,31 0,00 16,57 15,87 207,03 6673.75 16925,00 99,85 99,58
Henan 4,88 0,00 21,19 20,41 168,14 5294.19 9470,00 92,16 81,98
Hubei 5,30 4,21 21,42 19,63 213,37 6398.52 10500,00 98,19 95,09
Hunan 3,12 5,29 17,42 17,77 168,07 6884.61 9117,00 96,86 91,93
Guangdong 2,80 0,22 26,47 21,04 327,46 10694.79 19707,00 97,47 96,89
Guangxi 3,42 39,48 23,46 20,79 184,63 6445.73 7196,00 95,91 93,58
Hainan 4,76 10,00 19,99 21,08 194,32 5802.40 9450,00 98,38 94,59
Chongqing 4,81 5,62 23,85 18,45 215,14 7973.05 9608,00 96,25 93,09
Sichuan 6,67 4,32 23,92 19,34 181,93 6371.14 8113,00 91,76 89,68
Guizhou 10,44 24,67 26,52 21,94 181,79 5494.45 4215,00 92,86 74,86
Yunnan 8,45 26,81 20,12 19,49 230,78 6837.01 6733,00 97,92 71,53
Tibet 41,18 94,92 24,02 19,24 556,66 8338.21 7779,00 96,95 38,62
Shaanxi 4,29 0,00 19,73 19,31 251,92 6233.07 7757,00 96,52 93,75
Gansu 7,39 7,10 24,51 19,90 225,46 5937.30 5970,00 93,68 80,22
Qinghai 13,53 38,38 18,42 16,64 281,13 5758.95 8606,00 99,08 84,76
Ningxia 7,88 36,41 19,99 17,64 259,89 5821.38 7880,00 96,51 89,08
Xinjiang 4,04 63,06 16,44 16,69 302,85 5773.62 11199,00 98,08 96,37
Table 1. Data

6
Descriptive statistics
Figure 3. Illiteracy by province

%Illiterate
45,00

40,00

35,00

30,00

25,00

20,00

15,00 %Illiterate
10,00

5,00

0,00
Shanxi

Jilin

Hubei

Gansu
Tibet
Zhejiang

Guangdong

Chongqing

Qinghai
Fujian

Ningxia
Guangxi

Xinjiang
Yunnan
Heilongjiang

Jiangsu

Anhui

Henan

Guizhou
Tianjin
Hebei

Jiangxi
Inner Mongolia

Shandong
Beijing

Hainan

Sichuan

Shaanxi
Liaoning

Hunan
Shanghai

This bar-chart shows how high the illiteracy rate is in each province. The highest illiteracy rate has
been observed in Tibet much higher than all the other provinces, followed by Qinghai and then
Guizhou. Illiteracy is lowest in the north-eastern region, including provinces such as Heilongjiang and
Liaoning.

7
The following histogram demonstrates the distribution of Illiteracy rate among a number of
provinces. The Y axis explains the percentage of observations (or Provinces) that fit into each
category. The X axis shows the categories of Illiteracy rate observed. As we can clearly see, the
majority of all provinces (or about 61.3%) have illiteracy rate from 0% to 5%. Half as many provinces
have illiteracy rate between 5 and 10%.

Figure 4. Illiteracy histogram

Histogram
70
60
50
Percent

40
30
Series1
20
10
0

%Illiterate

Frequency Distribution -
Quantitative

%Illiterate cumulative
lower upper midpoint width percent percent
frequency frequency
0,0 < 5,0 2,5 5,0 19 61,3 19 61,3
5,0 < 10,0 7,5 5,0 9 29,0 28 90,3
10,0 < 15,0 12,5 5,0 2 6,5 30 96,8
15,0 < 20,0 17,5 5,0 0 0,0 30 96,8
20,0 < 25,0 22,5 5,0 0 0,0 30 96,8
25,0 < 30,0 27,5 5,0 0 0,0 30 96,8
30,0 < 35,0 32,5 5,0 0 0,0 30 96,8
35,0 < 40,0 37,5 5,0 0 0,0 30 96,8
40,0 45,0 42,5 5,0 1 3,2 31 100,0
31 100,0
Table 2. Frequency distribution

8
Descriptive statistics for Illiteracy rate (%Illiterate)

Mean=5.99 means that on average provinces have Illiteracy rate of 6%.

The sample standard deviation of 7 is very high, and explains that there is a big difference from
province to province. Some provinces have very low illiteracy rate, others very high.

Maximum=41.18% tells us the reason for the high SD Tibet has much higher Illiteracy compared to
the rest of China.

Skewness is positive, which means that the distribution is not normal distribution but has a long tail
to the right, or is Right Skewed (as can be seen in the histogram). Moreover, its value is very high,
even above 3, which means the skewness is very significant.

The coefficient of variation CV is the ratio of the SD to the mean. A value of 118% means very high
variation.

Descriptive statistics
%Illiterate
count 31
mean 5,9874
sample variance 49,8621
sample standard deviation 7,0613
minimum 1,52
maximum 41,18
range 39,66

confidence interval, 95% 3,3973


lower
confidence interval, 95% 8,5775
upper
confidence interval, 99% 2,4997
lower
confidence interval, 99% 9,4751
upper

skewness 4,4088
kurtosis 21,8936
coefficient of variation 117,94%
(CV)

1st quartile 2,9600


median 4,2900
3rd quaritle 6,0250
interquartile range 3,0650
mode 3,1200

low extremes 0
low outliers 0
high outliers 1
high extremes 1
Table 3. Descriptive Statistics

9
Figure 5. Box-plot for Illiteracy rate (%Illiterate)

%Illiterate

0 5 10 15 20 25 30 35 40 45

The box-plot is a graphical display based on quartiles and combines five elements: Minimum value,
First Quartile, Median, Third Quartile, and Maximum value. It is also very useful in determining the
Outlier observations, in this case Qinghai and Tibet.

More descriptive statistics has been compiled using Eviews, and displayed below.

In the next graph we can trace the relation between the portion of minority population from the
regional population and the illiteracy rate. The positive relation is most clearly observed for the
minority regions of Inner Mongolia, Tibet, Qinghai and Guizhou.

Figure 6. Illiteracy and Minority population


200

100
70
50

30
20

10
7
5

3
2

1
Inner Mongolia

Jilin

Ningxia
Beijing
Tianjin

Hainan
Liaoning

Henan

Hunan
Guangdong

Gansu
Fujian

Yunnan
Jiangsu
Heilongjiang

Sichuan
Zhejiang

Shandong

Guizhou

Xinjiang
Shanxi

Anhui

Chongqing

Qinghai
Hebei

Hubei
Jiangxi
Shanghai

Shaanxi
Guangxi

Tibet

ILLIT MINP04

In this figure we can clearly see that Illiteracy and Minority population distribution are very closely
related. Minority societies with higher concentration have a higher chance to be illiterate. Many of
these ethnic minority groups have their own language and own identity, and integrating them is
crucial for Chinas development and security.

10
Figure 7. Illiteracy and Student/Teacher ratio in Primary Schools

70

60

50

40

30

20

10

0
Inner Mongolia

Jilin

Hainan

Ningxia
Tianjin

Liaoning
Beijing

Henan

Hunan
Guangdong

Yunnan
Heilongjiang

Gansu
Jiangsu
Zhejiang

Fujian

Shandong

Chongqing
Sichuan

Xinjiang
Guizhou
Shanxi

Anhui

Qinghai
Hebei

Hubei
Shanghai

Jiangxi

Shaanxi
Guangxi

Tibet
ILLIT PRIM04

In Figure 7 it can clearly be seen that the two variables Illiteracy rate (%Illiterate) and Student to
Teacher ratio in Primary schools (St/T_Prim) follow a similar pattern. Higher student/teacher ratio is
related to higher illiteracy rate. Fewer teachers per student, or classes with many children, receive
less individual attention and the quality of education suffers.

11
20
40
60
80
10
20
30
40
50

0
0
0

100
400
800
1,200
1,600
Beijing Beijing Beijing
Tianjin Tianjin Tianjin
Hebei Hebei Hebei
Shanxi Shanxi Shanxi
Inner Mongolia Inner Mongolia Inner Mongolia
Liaoning Liaoning Liaoning
Jilin Jilin Jilin
Heilongjiang Heilongjiang Heilongjiang
Shanghai Shanghai Shanghai
Jiangsu Jiangsu Jiangsu
Zhejiang Zhejiang Zhejiang
Anhui Anhui Anhui
Fujian Fujian Fujian
Jiangxi Jiangxi Jiangxi
Shandong Shandong Shandong
Henan Henan Henan
ILLIT

Hubei Hubei Hubei

MINP04
FUND04
Hunan Hunan Hunan

reduce the effect of the outliers.


Guangdong Guangdong Guangdong
Guangxi Guangxi Guangxi
Hainan Hainan Hainan
Chongqing Chongqing Chongqing
Sichuan Sichuan Sichuan
Guizhou Guizhou Guizhou
Yunnan Yunnan Yunnan
Tibet Tibet Tibet
Shaanxi Shaanxi Shaanxi
Gansu Gansu Gansu
Qinghai Qinghai Qinghai
Ningxia Ningxia Ningxia
Xinjiang Xinjiang Xinjiang

12
16
20
24
28
10,000
12,000
14,000

8
4,000
6,000
8,000

Beijing Beijing
Tianjin Tianjin
Hebei Hebei
Shanxi Shanxi
Inner Mongolia Inner Mongolia
Liaoning Liaoning
Jilin Jilin
Heilongjiang Heilongjiang
Shanghai Shanghai
Jiangsu Jiangsu
Zhejiang Zhejiang
Anhui Anhui
Fujian Fujian
Jiangxi Jiangxi
Shandong Shandong
Figure 8. Distribution of all variables, Box-plots and Outliers

Henan Henan
Hubei Hubei
PRIM04
CONS04

Hunan Hunan
Guangdong Guangdong
Guangxi Guangxi
Hainan Hainan
Chongqing Chongqing
Sichuan Sichuan
Guizhou Guizhou
Yunnan Yunnan
Tibet Tibet
Shaanxi Shaanxi
Gansu Gansu
Qinghai Qinghai
Ningxia Ningxia
Xinjiang Xinjiang

because that would break the random sampling rule, and because their effects on illiteracy are
have a very high variance. Furthermore, provinces Beijing, Shanghai, Guangzhou, Tibet and Xinjiang
In figure 8 we can observe that all variables, except for student/teacher ratio in primary schools,

12
important. However, using log functions when specifying the Linear Regression model can help
record outlier values in some of those variables. However, these observations cannot be removed,
Inferential statistics
Correlation Analysis
Before we can decide how to model our problem of explaining Illiteracy, we can use correlation
analysis. Correlation analysis examines the association between variables to explain how they are
related to each other. Positive correlation means that variables move in the same direction. Negative
that as one variable rises, the other one must fall. Correlation can range between 0 (no correlation,
the variables are independent of each other) and 1 (100% correlation, or that the variables change in
the exact same amounts).

The following Correlation Matrix has been created using Megastat to help us analyze correlation.

Correlation Matrix
%Illiterate %Minority St/T_Prim St/T_Mid EdFndCap Cnsmpt GDP/cap %Water %Gas
%Illiterate 1,000
%Minority ,763 1,000
St/T_Prim ,368 ,206 1,000
St/T_Mid ,224 ,090 ,815 1,000
EdFndCap ,137 ,085 -,440 -,579 1,000
Cnsmpt -,001 -,137 -,183 -,405 ,791 1,000
GDP/cap -,247 -,312 -,504 -,576 ,742 ,840 1,000
%Water -,104 -,100 -,391 -,388 ,378 ,455 ,536 1,000
%Gas -,872 -,707 -,357 -,263 -,002 ,187 ,414 ,421 1,000

sample
31 size

,355 critical value .05 (two-tail)


,456 critical value .01 (two-tail)
Table 5. Correlation Matrix

In this matrix all of the variables have been included. Let us first trace how %Illiterate is related to the
rest of the variables. Its strongest correlation is with %Gas (negative correlation the more
developed a region is, the lower the illiteracy rate) and with %Minority (positive correlation the
higher the % of ethnic minority population in a province is, the higher the observed illiteracy rate is).
This implies that these two variables have the best chance of explaining changes in Illiteracy.

If we look at the rest of the table, we can see that the explanatory variables are also correlated with
one-another. For example, naturally Consumption and GDP per capita are highly correlated. This tells
us that some of the variables in our model might be redundant.

Linear regression
As explained in the Previous work and Theoretical background sections of the report, illiteracy
rate is closely linked to government spending, quality of education, poverty and ethnic minority
concentration. Furthermore, it is important to note that all the variables affecting illiteracy rate have
a lag effect. In other words, illiteracy manifests itself only years after policy measures have been
applied. This is due to education being a long process which lasts for several years. In this papers
model we assume that primary school education has the highest effect on literacy rate. Since literacy
is measured on individuals, aged 15 years or older, a lag of 10 years has been assumed, or roughly
the time when students enter primary school.

13
All models in our linear regression have their variables in log forms to represent percent change,
except for the variables which are already denoted in percentage. OLS method has been used.

The first model we will examine will be a three variable model, attempting to explain %Illiterate
with %Minority and %Gas. As already discussed, we chose those, since the two variables have the
highest correlation with %Illiterate.

1. %Illiterate = + 1%Minority + 2%Gas

Next we can try to add the other variables with higher correlation and check the results. Such a
variable is education (St/T_Prim and St/T_Mid are highly correlated with each other, so using one of
them should be enough, St/T_Prim has a higher correlation with %Illiterate so it should be used).
Another variable is quality of living, which can best be explained by GDP/cap.

2. %Illiterate = + 1%Minority + 2%Gas + 3log(St/T_Prim) + 4log(GDP/cap)

Last but not least, we need to examine the effect of the most obvious variable education funding
(EdFndCap). Since it has high correlation with GDP/capita and with St/T_Prim, those two variables
should be omitted when including EdFndCap.

3. %Illiterate = + 1%Minority + 2%Gas + 3log(EdFndCap)

Dependent Variable: Illiteracy Rate


Observations: 31
Coefficients for Model 1 Model 2 Model 3
Variable
39.73*** -14.67 28.74***
Intercept
(6.59) (17.19) (9.31)
0.09** 0.10*** 0.08**
%Minority
(0.04) (0.03) (0.04)
7.04**
St/T_Prim
(2.68)
St/T_Mid
2.2
EdFndCap
(1.35)
Cnsmpt
4.00***
GDP/cap
(1.23)
%Water
-0.39*** -0.43*** -0.4***
%Gas
(0.07) (0.06) (0.07)

R squared 0.80 0.86 0.82


Adjusted R squared 0.79 0.84 0.80

*p<0.1, **p<0.05, ***p<0.01

Table 6. Estimation of coefficients

14
The results have been computed using Eviews. The coefficients have been marked with stars
depending on their p-values, more stars means higher significance. In brackets () standard deviation
has been noted. R squared has been calculated and is very high for each of the models, showing that
the models are good at explaining the objective variable %Illiterate. Based on the computations the
following conclusions can be made:

Discussion
Variables %Minority and %Gas are both very consistent at predicting the objective variable, which is
demonstrated by them having low p-values in all models. However, %Gas has higher coefficient
values, which means it has a bigger effect. A smaller change in it results in a larger change
in %Illiterate, compared to changes in %Minority. In other words, according to Model 2, a 1%
increase in the people who have access to gas in a province (which shows the level of development
and urbanization in that province) would reduce the illiteracy rate by 0.43%, whereas a 1% decrease
in %Minority would only decrease the illiteracy rate by 0.1%.

Model 2 has the highest R square, which makes it the best of the three models. An R squared of 0.86
means that the model can predict 86% of the dependent variable %Illiterate.

Both the number of students per teacher in primary schools (St/T_Prim) and GDP/cap are very strong
variables in predicting the illiteracy rate. The first explains the effect of better education. The second
explains the effect of higher quality of living. The variables are good because they are both very
significant (high number of stars) and their coefficients are large (7.04 and 4.00 respectively).
According to Model 2, a 1% decrease in the number of students per teacher would decrease the
illiteracy rate by 7.04% in more than 95% of the cases. Strangely GDP/cap has a positive coefficient,
and therefore a higher GDP per capita would imply a higher Illiteracy rate.

Last but not least, in Model 3 we can trace the effect of Government funding on the illiteracy rate. Its
coefficient is fairly strong, equal to 2.2. A 1% increase in government funding should reduce the
illiteracy rate in that province by 2.2%. However significance is low. P-value in that case is equal to
0.12, or in 12% of the cases, the increase in spending would not have the expected effect. This means
that education funding in itself is not the most reliable way of improving the quality of education and
decreasing illiteracy, or it by itself is not enough.

Finally, below we can view a graphical representation of the OLS line-fitness our model (Model 2)
proposes for each variable.

15
Figure 9. Fit lines

16
Limitations of the model and further study
The models proposed in this paper have the following limitations:

They lack good theoretical backing.


They use a limited number of variables.
They only use cross-sectional data.

Examining more theory, gathering more data and especially including time-series can have great
contribution in this study area.

Conclusion
Every strong country needs a productive, well-educated labor force, if it wants to be competitive on
the international market. Literacy is important for maintaining every nations unity. China has
achieved a remarkable progress in promoting literacy among its citizens in the past. This paper has
analyzed data related to the illiteracy rate, still present in some of the minority areas of China.
Different factors and their effects on illiteracy have been discussed and three models for evaluating
government policy have been formulated. As clearly explained, illiteracy rate is highly related not
only to government spending on education, but also to integration, development and living standard.
However, each of these variables has a lag effect, and changes take years to manifest in results.
Therefore every government needs to undertake careful planning, when regional development is
concerned.

References
https://en.wikipedia.org/wiki/Maslow's_hierarchy_of_needs

http://www.sciencedirect.com/science/article/pii/S1043951X05000179

http://www.sciencedirect.com/science/article/pii/073805939400051P

http://www.stats.gov.cn/

17

Potrebbero piacerti anche