Crop Yield Prediction

PREDICTION OF RICE CROP YIELD USING MULTIPLE LINEAR REGRESSION
By: Ma. Graciela Vic C. Elape
I. INTRODUCTION
Agriculture is one of the sources of economy in the Philippines. It is
considered as the world’s eighth largest rice producers in the world. It is also one
of the major staple foods for almost all Filipinos across the country. For the
Philippines to become more productive in producing rice, it has to adopt existing
technologies to increase its yield.
Yield production is one of the most common agricultural problem.
Specifically, every farmer often wants to know or identify how much yield is he
about to expect. Basically, farmers identify their yield based on their previous
experience. When the data from the previous yield becomes information, it will
become very useful for many purposes especially in predicting the future yield.
Data mining is widely applied to solve several agricultural problems. It is
used extensively to analyze large data sets and establish patterns from these data
sets to be able to transform it into a more understandable structure for future use.
The main objective of this paper is to create platform for Filipino farmers
which will provide the analysis of rice production based on the available data.
Multiple linear regression technique shall be used to predict the crop yield in order
to maximize its productivity.

II. SIGNIFICANCE OF THE STUDY
Planting rice is the major source of income and livelihood in some
developing countries especially in the Philippines. Rice is also considered as the
most important crop and where majority of the Filipino farmers rely on.
If the farmers would be able to know their future yield, they would also be
able to maximize their production since, they can have the idea already on how
much amount of fertilizers must be used.
III. OBJECTIVES OF THE STUDY
This study is focused on developing a platform which will analyze the data
on rice production that will predict its future yield using multiple linear regression.
Specifically, this study aims to:
1. Predict the future rice crop yield based on the following input data from year
2003 to 2014:
 Year – specifies the year in which the data are available
 Area harvested – specified in hectares
 Area Applied – specified in hectares
 Average quantity applied – specifies the average amount of fertilizers in
bags of 50 kilograms
 Specific amount of fertilizers applied (urea, ammosul, ammophos, and
others) – specified in bags of 50 kilograms
 Yield – specified in metric tons
2. Derive a formula from the given input data in predicting the rice crop yield.
3. Predict the future yield using multiple linear regression method through MS
Excel.
IV. SCOPE OF THE STUDY
The study is focused on predicting the yield of rice crop from the available
national data. The data was gathered through CountryStat Philippines which is a
platform provided by the Philippine Statistics Authority (PSA). It is a platform
designed to generate the desired crop information and other agricultural statistical
information.
The yield prediction was based on the input data year, area harvested, area
applied, average quantity applied, specific fertilizers applied such as urea,
ammosul, ammophos, and others and the yield. Multiple linear regression method
will be used as the data mining technique in generating the desired output.
V. RELATED LITERATURE
From the research article [1], the researcher highlighted that large amount of data
must be collected and stored for data analysis. Appropriate used of these collected
data will lead to considerable gains for efficiency which will therefore provide
economic advantages.
Several data mining techniques can be used in the field of agriculture. The
researchers were able to implement [2] K-means algorithm for forecasting the
pollution in the atmosphere and different possible changes of the weather scenarios
are then analyzed using Support Vector Machines (SVM).

Multiple Linear Regression technique and Density-based clustering technique
was used by the researchers [3] for crop yield analysis. Crop yield prediction can be
made with the entire set of existing gathered information and was dedicated to a
suitable approach for improving the efficiency of the prediction.
VI. METHODOLOGY
Data was gathered in order to predict the future yield of rice crop. The
following table shows the summary of the rice crop production from year 2003-
2014.
AVERAGE FERTILIZERS APPLIED

YEAR AREA_HARVESTED AREA_APPLIED YIELD
QUANTITY_APPLIED UREA AMMOSUL AMMOPHOS OTHERS
2003 4,006,421.00 3,674,300.38 4.6 2.04 0.43 0.58 0.09 13,499,884.00
2004 4,126,645.00 3,756,976.99 4.68 2.02 0.44 0.64 0.08 14,496,784.00
2005 4,070,421.00 3,700,858.01 4.62 2.03 0.44 0.63 0.1 14,603,005.00
2006 4,159,930.00 3,815,585.60 4.76 1.99 0.47 0.72 0.09 15,326,706.00
2007 4,272,889.00 3,934,801.00 4.75 2.01 0.52 0.64 0.1 16,240,194.00
2008 4,459,977.00 4,225,049.00 4.16 1.86 0.51 0.45 0.08 16,815,548.00
2009 4,532,310.00 4,287,413.00 4.43 2.07 0.46 0.55 0 16,266,417.00
2010 4,354,161.00 3,935,498.34 4.58 2.12 0.51 0.53 0 15,772,319.00
2011 4,536,642.00 4,314,720.00 4.5 2.03 0.51 0.57 0 16,684,062.00
2012 4,689,960.00 4,572,108.10 4.66 2.02 0.52 0.54 0 18,032,525.47
2013 4,746,082.00 4,430,514.93 4.69 2.08 0.54 0.51 0 18,439,419.73
2014 4,739,672.16 4,402,248.79 5.06 2.24 0.57 0.52 0 18,967,826.17
Table 1. Summary of rice crop production based on estimated inorganic fertilizer use (Source:
http://countrystat.psa.gov.ph)
From the data gathered shown in Table 1, multiple linear regression was
done through MS Excel where Yield is the dependent variable and the area
harvested, area applied, average quantity applied and the different fertilizers
applied serves as the independent variables.
After the first analysis, the following summary were obtained.

Regression Statistics
Multiple R 0.997118725
R Square 0.994245752
Adjusted R Square 0.984175818
Standard Error 208759.6733
Observations 12
ANOVA
df SS MS F Significance F
Regression 7 3.01202E+13 4.30289E+12 98.73409 0.000258259
Residual 4 1.74322E+11 43580601198
Total 11 3.02946E+13
CoefficientsStandard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -20239830 5106173.3 -4.0 0.0 -34416840.1 -6062820.6 -34416840.1 -6062820.6
AREA HARVESTED 7.3 2.1 3.4 0.0 1.4 13.3 1.4 13.3
AREA APPLIED -0.7 1.5 -0.4 0.7 -4.8 3.5 -4.8 3.5
AVERAGE QUANTITY747599.2
APPLIED 1117653.7 0.7 0.5 -2355505.0 3850703.4 -2355505.0 3850703.4
UREA -500267.4 2980755.3 -0.2 0.9 -8776170.9 7775636.0 -8776170.9 7775636.0
AMMOSUL 7383185.7 3826488.2 1.9 0.1 -3240848.8 18007220.2 -3240848.8 18007220.2
AMMOPHOS 851303.4 2074780.8 0.4 0.7 -4909211.8 6611818.5 -4909211.8 6611818.5
OTHERS 9497778.4 4707094.8 2.0 0.1 -3571211.9 22566768.8 -3571211.9 22566768.8
Figure 1. Summary output
For a more reliable result in predicting the rice crop yield, those variables
which has a P-value that has 0.15 or greater was removed. It was said that having
these values in the P-value doesn’t matter in predicting the outcomes and therefore
it is not of significance. After checking the P-values, the considered independent
variables now are the Area Harvested, ammosul and the other kind of fertilizers
applied and the rest of the independent variables are excluded for the next
analysis.
YEAR AREA HARVESTED AMMOSUL OTHERS YIELD
2003 4,006,421.00 0.43 0.09 13,499,884.00
2004 4,126,645.00 0.44 0.08 14,496,784.00
2005 4,070,421.00 0.44 0.1 14,603,005.00
2006 4,159,930.00 0.47 0.09 15,326,706.00
2007 4,272,889.00 0.52 0.1 16,240,194.00
2008 4,459,977.00 0.51 0.08 16,815,548.00
2009 4,532,310.00 0.46 0 16,266,417.00
2010 4,354,161.00 0.51 0 15,772,319.00
2011 4,536,642.00 0.51 0 16,684,062.00
2012 4,689,960.00 0.52 0 18,032,525.47
2013 4,746,082.00 0.54 0 18,439,419.73
2014 4,739,672.16 0.57 0 18,967,826.17
Table 2. Summary of new data
SUMMARY OUTPUT After the second analysis, the following results were obtained.
Regression Statistics
Multiple R 0.99220041
R Square 0.984461654
Adjusted R Square 0.978634774
Standard Error 242571.4658
Observations 12
ANOVA
df SS MS F Significance F
Regression 3 2.98238E+13 9.94128E+12 168.9517706 1.42562E-07
Residual 8 4.70727E+11 58840915999
Total 11 3.02946E+13
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept (i) -14860650.34 2560071.49 -5.804779438 0.000402962 -20764185.78 -8957114.9 -20764185.78 -8957114.9
AREA HARVESTED (x1) 5.745114852 0.781755906 7.348988102 8.00168E-05 3.9423825 7.547847203 3.9423825 7.547847203
AMMOSUL (x2) 11162549.45 3167519.183 3.524066882 0.007799368 3858237.115 18466861.78 3858237.115 18466861.78
OTHERS (x3) 8612484.726 2993311.343 2.877243207 0.020598532 1709896.392 15515073.06 1709896.392 15515073.06
Figure 2. Summary of output after 2nd analysis
Multiple linear regression is one of the most commonly used statistical
technique that uses several variables to predict the desired outcome of a response
variable. From the generated results shown in Figure 2, the following formula for
predicting the rice crop yield was derived using the principle of multiple linear
regression. The derived formula is shown below:
y=i+x1*xi2+x2*xi2+x3*xi3
where y is the yield prediction, x1, x2, x3 are the independent variables and xi1,
xi2, xi3 are the coefficient values of x1, x2, and x3.
The same data was used to predict the yield of rice crop. An approximate
95% prediction interval was set (PI). After calculation, yield prediction can be
determined between the lower and upper bound of data.
Area Ammosul Yield Prediction Aprrox. Standard Approx. 95% PI

Others (xi3) t-value Margin of Error Interval Width
Harvested (xi1) (xi2) (y) Error of Prediction
Lower Bound Upper Bound
4,006,421.00 0.43 0.09 13,731,718.34 2.306004135 266828.6123 615307.8834 13,116,410.45 14,347,026.22 1,230,615.77
4,126,645.00 0.44 0.08 14,447,919.67 2.306004135 266828.6123 615307.8834 13,832,611.79 15,063,227.55 1,230,615.77
4,070,421.00 0.44 0.1 14,297,156.03 2.306004135 266828.6123 615307.8834 13,681,848.14 14,912,463.91 1,230,615.77
4,159,930.00 0.47 0.09 15,060,147.15 2.306004135 266828.6123 615307.8834 14,444,839.27 15,675,455.03 1,230,615.77
4,272,889.00 0.52 0.1 16,353,361.90 2.306004135 266828.6123 615307.8834 15,738,054.01 16,968,669.78 1,230,615.77
4,459,977.00 0.51 0.08 17,144,328.76 2.306004135 266828.6123 615307.8834 16,529,020.87 17,759,636.64 1,230,615.77
4,532,310.00 0.46 0 16,312,763.90 2.306004135 266828.6123 615307.8834 15,697,456.01 16,928,071.78 1,230,615.77
4,354,161.00 0.51 0 15,847,404.90 2.306004135 266828.6123 615307.8834 15,232,097.02 16,462,712.79 1,230,615.77
4,536,642.00 0.51 0 16,895,779.21 2.306004135 266828.6123 615307.8834 16,280,471.32 17,511,087.09 1,230,615.77
4,689,960.00 0.52 0 17,888,234.22 2.306004135 266828.6123 615307.8834 17,272,926.34 18,503,542.10 1,230,615.77
4,746,082.00 0.54 0 18,433,912.55 2.306004135 266828.6123 615307.8834 17,818,604.66 19,049,220.43 1,230,615.77
4,739,672.16 0.57 0 18,731,963.76 2.306004135 266828.6123 615307.8834 18,116,655.88 19,347,271.65 1,230,615.77
Table 3. Sample output applying the derived formula.
VII. RESULTS AND DISCUSSION
In this paper, rice crop yield analysis was processed through the
implementation of multiple linear regression. The exact value along with the
predicted value from year 2003 to 2014 is shown in Table 4. The estimated results
range between -2.09% and 1.96%
Area
Ammosul Others Yield Percentage
YEAR Harvested Actual Yield
(xi2) (xi3) Prediction (y) of Difference
(xi1)
2003 4,006,421.00 0.43 0.09 13,731,718.34 13,499,884.00 1.72
2004 4,126,645.00 0.44 0.08 14,447,919.67 14,496,784.00 (0.34)
2005 4,070,421.00 0.44 0.1 14,297,156.03 14,603,005.00 (2.09)
2006 4,159,930.00 0.47 0.09 15,060,147.15 15,326,706.00 (1.74)
2007 4,272,889.00 0.52 0.1 16,353,361.90 16,240,194.00 0.70
2008 4,459,977.00 0.51 0.08 17,144,328.76 16,815,548.00 1.96
2009 4,532,310.00 0.46 0 16,312,763.90 16,266,417.00 0.28
2010 4,354,161.00 0.51 0 15,847,404.90 15,772,319.00 0.48
2011 4,536,642.00 0.51 0 16,895,779.21 16,684,062.00 1.27
2012 4,689,960.00 0.52 0 17,888,234.22 18,032,525.47 (0.80)
2013 4,746,082.00 0.54 0 18,433,912.55 18,439,419.73 (0.03)
2014 4,739,672.16 0.57 0 18,731,963.76 18,967,826.17 (1.24)
Table 4. Exact yield and predicted yield using multiple linear regression.
VIII. CONCLUSION
The statistical model multiple linear regression was used on the existing and
available data. Data that was gathered was based on the total rice crop field in all
regions in the Philippines. After several procedures that has been done, it can be
concluded that the use of multiple linear regression has able to establish a
relationship between the set dependent and independent variables that will take
an effect in the yield of rice crop.
On the other hand, this work can be extended through considering other
factors like the weather condition and the amount of rainfall that might occur in the
place and the use of other statistical tools and other data mining techniques.
REFERENCES
[1] G Ruß, "Data Mining of Agricultural Yield Data : A Comparison of Regression Models",
Conference Proceedings, Advances in Data Mining – Applications and Theoretical Aspects, P
Perner (Ed.), Lecture Notes in Artificial Intelligence 6171, Berlin, Heidelberg, Springer, 2009.
[2] Jorquera H, Perez R, Cipriano A, Acuna G, "Short Term Forecasting of Air Pollution Episodes",
In: Zannetti P (eds) Environmental modeling , WIT Press, UK, 2001.
[3] D Ramesh, B Vishnu Vardhan, “Analysis of Crop Yield Prediction using Data Mining
Techniques”, International Journal of Research in Engineering and Technology, 2015.
Online references:
http://countrystat.psa.gov.ph
http://countrystat.psa.gov.ph/selection.asp
http://ricepedia.org/philippines

Crop Yield Prediction

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Crop Yield Prediction

Caricato da

Copyright:

Formati disponibili

PREDICTION OF RICE CROP YIELD USING MULTIPLE LINEAR REGRESSION

By: Ma. Graciela Vic C. Elape

Agriculture is one of the sources of economy in the Philippines. It is

Philippines to become more productive in producing rice, it has to adopt existing

technologies to increase its yield.

Yield production is one of the most common agricultural problem.

Data mining is widely applied to solve several agricultural problems. It is

to maximize its productivity.

Planting rice is the major source of income and livelihood in some

developing countries especially in the Philippines. Rice is also considered as the

much amount of fertilizers must be used.

III. OBJECTIVES OF THE STUDY

Specifically, this study aims to:

 Year – specifies the year in which the data are available

 Area harvested – specified in hectares

 Area Applied – specified in hectares

 Average quantity applied – specifies the average amount of fertilizers in

 Specific amount of fertilizers applied (urea, ammosul, ammophos, and

others) – specified in bags of 50 kilograms

 Yield – specified in metric tons

IV. SCOPE OF THE STUDY

platform provided by the Philippine Statistics Authority (PSA). It is a platform

applied, average quantity applied, specific fertilizers applied such as urea,

are then analyzed using Support Vector Machines (SVM).

suitable approach for improving the efficiency of the prediction.

AVERAGE FERTILIZERS APPLIED

applied serves as the independent variables.

After the first analysis, the following summary were obtained.

Figure 1. Summary output

it is not of significance. After checking the P-values, the considered independent

Table 2. Summary of new data

Figure 2. Summary of output after 2nd analysis

Multiple linear regression is one of the most commonly used statistical

regression. The derived formula is shown below:

determined between the lower and upper bound of data.

Area Ammosul Yield Prediction Aprrox. Standard Approx. 95% PI

Table 3. Sample output applying the derived formula.

VII. RESULTS AND DISCUSSION

range between -2.09% and 1.96%

an effect in the yield of rice crop.

Conference Proceedings, Advances in Data Mining – Applications and Theoretical Aspects, P

In: Zannetti P (eds) Environmental modeling , WIT Press, UK, 2001.

Techniques”, International Journal of Research in Engineering and Technology, 2015.

Potrebbero piacerti anche