Sei sulla pagina 1di 9

PREDICTION OF RICE CROP YIELD USING MULTIPLE LINEAR REGRESSION

By: Ma. Graciela Vic C. Elape

I. INTRODUCTION

Agriculture is one of the sources of economy in the Philippines. It is

considered as the world’s eighth largest rice producers in the world. It is also one

of the major staple foods for almost all Filipinos across the country. For the

Philippines to become more productive in producing rice, it has to adopt existing

technologies to increase its yield.

Yield production is one of the most common agricultural problem.

Specifically, every farmer often wants to know or identify how much yield is he

about to expect. Basically, farmers identify their yield based on their previous

experience. When the data from the previous yield becomes information, it will

become very useful for many purposes especially in predicting the future yield.

Data mining is widely applied to solve several agricultural problems. It is

used extensively to analyze large data sets and establish patterns from these data

sets to be able to transform it into a more understandable structure for future use.

The main objective of this paper is to create platform for Filipino farmers

which will provide the analysis of rice production based on the available data.

Multiple linear regression technique shall be used to predict the crop yield in order

to maximize its productivity.


II. SIGNIFICANCE OF THE STUDY

Planting rice is the major source of income and livelihood in some

developing countries especially in the Philippines. Rice is also considered as the

most important crop and where majority of the Filipino farmers rely on.

If the farmers would be able to know their future yield, they would also be

able to maximize their production since, they can have the idea already on how

much amount of fertilizers must be used.

III. OBJECTIVES OF THE STUDY

This study is focused on developing a platform which will analyze the data

on rice production that will predict its future yield using multiple linear regression.

Specifically, this study aims to:

1. Predict the future rice crop yield based on the following input data from year

2003 to 2014:

 Year – specifies the year in which the data are available

 Area harvested – specified in hectares

 Area Applied – specified in hectares

 Average quantity applied – specifies the average amount of fertilizers in

bags of 50 kilograms

 Specific amount of fertilizers applied (urea, ammosul, ammophos, and

others) – specified in bags of 50 kilograms

 Yield – specified in metric tons

2. Derive a formula from the given input data in predicting the rice crop yield.
3. Predict the future yield using multiple linear regression method through MS

Excel.

IV. SCOPE OF THE STUDY

The study is focused on predicting the yield of rice crop from the available

national data. The data was gathered through CountryStat Philippines which is a

platform provided by the Philippine Statistics Authority (PSA). It is a platform

designed to generate the desired crop information and other agricultural statistical

information.

The yield prediction was based on the input data year, area harvested, area

applied, average quantity applied, specific fertilizers applied such as urea,

ammosul, ammophos, and others and the yield. Multiple linear regression method

will be used as the data mining technique in generating the desired output.

V. RELATED LITERATURE

From the research article [1], the researcher highlighted that large amount of data

must be collected and stored for data analysis. Appropriate used of these collected

data will lead to considerable gains for efficiency which will therefore provide

economic advantages.

Several data mining techniques can be used in the field of agriculture. The

researchers were able to implement [2] K-means algorithm for forecasting the

pollution in the atmosphere and different possible changes of the weather scenarios

are then analyzed using Support Vector Machines (SVM).


Multiple Linear Regression technique and Density-based clustering technique

was used by the researchers [3] for crop yield analysis. Crop yield prediction can be

made with the entire set of existing gathered information and was dedicated to a

suitable approach for improving the efficiency of the prediction.

VI. METHODOLOGY

Data was gathered in order to predict the future yield of rice crop. The

following table shows the summary of the rice crop production from year 2003-

2014.

AVERAGE FERTILIZERS APPLIED


YEAR AREA_HARVESTED AREA_APPLIED YIELD
QUANTITY_APPLIED UREA AMMOSUL AMMOPHOS OTHERS
2003 4,006,421.00 3,674,300.38 4.6 2.04 0.43 0.58 0.09 13,499,884.00
2004 4,126,645.00 3,756,976.99 4.68 2.02 0.44 0.64 0.08 14,496,784.00
2005 4,070,421.00 3,700,858.01 4.62 2.03 0.44 0.63 0.1 14,603,005.00
2006 4,159,930.00 3,815,585.60 4.76 1.99 0.47 0.72 0.09 15,326,706.00
2007 4,272,889.00 3,934,801.00 4.75 2.01 0.52 0.64 0.1 16,240,194.00
2008 4,459,977.00 4,225,049.00 4.16 1.86 0.51 0.45 0.08 16,815,548.00
2009 4,532,310.00 4,287,413.00 4.43 2.07 0.46 0.55 0 16,266,417.00
2010 4,354,161.00 3,935,498.34 4.58 2.12 0.51 0.53 0 15,772,319.00
2011 4,536,642.00 4,314,720.00 4.5 2.03 0.51 0.57 0 16,684,062.00
2012 4,689,960.00 4,572,108.10 4.66 2.02 0.52 0.54 0 18,032,525.47
2013 4,746,082.00 4,430,514.93 4.69 2.08 0.54 0.51 0 18,439,419.73
2014 4,739,672.16 4,402,248.79 5.06 2.24 0.57 0.52 0 18,967,826.17

Table 1. Summary of rice crop production based on estimated inorganic fertilizer use (Source:

http://countrystat.psa.gov.ph)

From the data gathered shown in Table 1, multiple linear regression was

done through MS Excel where Yield is the dependent variable and the area

harvested, area applied, average quantity applied and the different fertilizers

applied serves as the independent variables.

After the first analysis, the following summary were obtained.


Regression Statistics
Multiple R 0.997118725
R Square 0.994245752
Adjusted R Square 0.984175818
Standard Error 208759.6733
Observations 12

ANOVA
df SS MS F Significance F
Regression 7 3.01202E+13 4.30289E+12 98.73409 0.000258259
Residual 4 1.74322E+11 43580601198
Total 11 3.02946E+13

CoefficientsStandard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -20239830 5106173.3 -4.0 0.0 -34416840.1 -6062820.6 -34416840.1 -6062820.6
AREA HARVESTED 7.3 2.1 3.4 0.0 1.4 13.3 1.4 13.3
AREA APPLIED -0.7 1.5 -0.4 0.7 -4.8 3.5 -4.8 3.5
AVERAGE QUANTITY747599.2
APPLIED 1117653.7 0.7 0.5 -2355505.0 3850703.4 -2355505.0 3850703.4
UREA -500267.4 2980755.3 -0.2 0.9 -8776170.9 7775636.0 -8776170.9 7775636.0
AMMOSUL 7383185.7 3826488.2 1.9 0.1 -3240848.8 18007220.2 -3240848.8 18007220.2
AMMOPHOS 851303.4 2074780.8 0.4 0.7 -4909211.8 6611818.5 -4909211.8 6611818.5
OTHERS 9497778.4 4707094.8 2.0 0.1 -3571211.9 22566768.8 -3571211.9 22566768.8

Figure 1. Summary output

For a more reliable result in predicting the rice crop yield, those variables

which has a P-value that has 0.15 or greater was removed. It was said that having

these values in the P-value doesn’t matter in predicting the outcomes and therefore

it is not of significance. After checking the P-values, the considered independent

variables now are the Area Harvested, ammosul and the other kind of fertilizers

applied and the rest of the independent variables are excluded for the next

analysis.
YEAR AREA HARVESTED AMMOSUL OTHERS YIELD
2003 4,006,421.00 0.43 0.09 13,499,884.00
2004 4,126,645.00 0.44 0.08 14,496,784.00
2005 4,070,421.00 0.44 0.1 14,603,005.00
2006 4,159,930.00 0.47 0.09 15,326,706.00
2007 4,272,889.00 0.52 0.1 16,240,194.00
2008 4,459,977.00 0.51 0.08 16,815,548.00
2009 4,532,310.00 0.46 0 16,266,417.00
2010 4,354,161.00 0.51 0 15,772,319.00
2011 4,536,642.00 0.51 0 16,684,062.00
2012 4,689,960.00 0.52 0 18,032,525.47
2013 4,746,082.00 0.54 0 18,439,419.73
2014 4,739,672.16 0.57 0 18,967,826.17

Table 2. Summary of new data

SUMMARY OUTPUT After the second analysis, the following results were obtained.

Regression Statistics
Multiple R 0.99220041
R Square 0.984461654
Adjusted R Square 0.978634774
Standard Error 242571.4658
Observations 12

ANOVA
df SS MS F Significance F
Regression 3 2.98238E+13 9.94128E+12 168.9517706 1.42562E-07
Residual 8 4.70727E+11 58840915999
Total 11 3.02946E+13

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept (i) -14860650.34 2560071.49 -5.804779438 0.000402962 -20764185.78 -8957114.9 -20764185.78 -8957114.9
AREA HARVESTED (x1) 5.745114852 0.781755906 7.348988102 8.00168E-05 3.9423825 7.547847203 3.9423825 7.547847203
AMMOSUL (x2) 11162549.45 3167519.183 3.524066882 0.007799368 3858237.115 18466861.78 3858237.115 18466861.78
OTHERS (x3) 8612484.726 2993311.343 2.877243207 0.020598532 1709896.392 15515073.06 1709896.392 15515073.06

Figure 2. Summary of output after 2nd analysis

Multiple linear regression is one of the most commonly used statistical

technique that uses several variables to predict the desired outcome of a response

variable. From the generated results shown in Figure 2, the following formula for
predicting the rice crop yield was derived using the principle of multiple linear

regression. The derived formula is shown below:

y=i+x1*xi2+x2*xi2+x3*xi3

where y is the yield prediction, x1, x2, x3 are the independent variables and xi1,

xi2, xi3 are the coefficient values of x1, x2, and x3.

The same data was used to predict the yield of rice crop. An approximate

95% prediction interval was set (PI). After calculation, yield prediction can be

determined between the lower and upper bound of data.

Area Ammosul Yield Prediction Aprrox. Standard Approx. 95% PI


Others (xi3) t-value Margin of Error Interval Width
Harvested (xi1) (xi2) (y) Error of Prediction
Lower Bound Upper Bound
4,006,421.00 0.43 0.09 13,731,718.34 2.306004135 266828.6123 615307.8834 13,116,410.45 14,347,026.22 1,230,615.77
4,126,645.00 0.44 0.08 14,447,919.67 2.306004135 266828.6123 615307.8834 13,832,611.79 15,063,227.55 1,230,615.77
4,070,421.00 0.44 0.1 14,297,156.03 2.306004135 266828.6123 615307.8834 13,681,848.14 14,912,463.91 1,230,615.77
4,159,930.00 0.47 0.09 15,060,147.15 2.306004135 266828.6123 615307.8834 14,444,839.27 15,675,455.03 1,230,615.77
4,272,889.00 0.52 0.1 16,353,361.90 2.306004135 266828.6123 615307.8834 15,738,054.01 16,968,669.78 1,230,615.77
4,459,977.00 0.51 0.08 17,144,328.76 2.306004135 266828.6123 615307.8834 16,529,020.87 17,759,636.64 1,230,615.77
4,532,310.00 0.46 0 16,312,763.90 2.306004135 266828.6123 615307.8834 15,697,456.01 16,928,071.78 1,230,615.77
4,354,161.00 0.51 0 15,847,404.90 2.306004135 266828.6123 615307.8834 15,232,097.02 16,462,712.79 1,230,615.77
4,536,642.00 0.51 0 16,895,779.21 2.306004135 266828.6123 615307.8834 16,280,471.32 17,511,087.09 1,230,615.77
4,689,960.00 0.52 0 17,888,234.22 2.306004135 266828.6123 615307.8834 17,272,926.34 18,503,542.10 1,230,615.77
4,746,082.00 0.54 0 18,433,912.55 2.306004135 266828.6123 615307.8834 17,818,604.66 19,049,220.43 1,230,615.77
4,739,672.16 0.57 0 18,731,963.76 2.306004135 266828.6123 615307.8834 18,116,655.88 19,347,271.65 1,230,615.77

Table 3. Sample output applying the derived formula.

VII. RESULTS AND DISCUSSION

In this paper, rice crop yield analysis was processed through the

implementation of multiple linear regression. The exact value along with the
predicted value from year 2003 to 2014 is shown in Table 4. The estimated results

range between -2.09% and 1.96%

Area
Ammosul Others Yield Percentage
YEAR Harvested Actual Yield
(xi2) (xi3) Prediction (y) of Difference
(xi1)
2003 4,006,421.00 0.43 0.09 13,731,718.34 13,499,884.00 1.72
2004 4,126,645.00 0.44 0.08 14,447,919.67 14,496,784.00 (0.34)
2005 4,070,421.00 0.44 0.1 14,297,156.03 14,603,005.00 (2.09)
2006 4,159,930.00 0.47 0.09 15,060,147.15 15,326,706.00 (1.74)
2007 4,272,889.00 0.52 0.1 16,353,361.90 16,240,194.00 0.70
2008 4,459,977.00 0.51 0.08 17,144,328.76 16,815,548.00 1.96
2009 4,532,310.00 0.46 0 16,312,763.90 16,266,417.00 0.28
2010 4,354,161.00 0.51 0 15,847,404.90 15,772,319.00 0.48
2011 4,536,642.00 0.51 0 16,895,779.21 16,684,062.00 1.27
2012 4,689,960.00 0.52 0 17,888,234.22 18,032,525.47 (0.80)
2013 4,746,082.00 0.54 0 18,433,912.55 18,439,419.73 (0.03)
2014 4,739,672.16 0.57 0 18,731,963.76 18,967,826.17 (1.24)

Table 4. Exact yield and predicted yield using multiple linear regression.

VIII. CONCLUSION

The statistical model multiple linear regression was used on the existing and

available data. Data that was gathered was based on the total rice crop field in all

regions in the Philippines. After several procedures that has been done, it can be

concluded that the use of multiple linear regression has able to establish a

relationship between the set dependent and independent variables that will take

an effect in the yield of rice crop.

On the other hand, this work can be extended through considering other

factors like the weather condition and the amount of rainfall that might occur in the

place and the use of other statistical tools and other data mining techniques.
REFERENCES

[1] G Ruß, "Data Mining of Agricultural Yield Data : A Comparison of Regression Models",

Conference Proceedings, Advances in Data Mining – Applications and Theoretical Aspects, P

Perner (Ed.), Lecture Notes in Artificial Intelligence 6171, Berlin, Heidelberg, Springer, 2009.

[2] Jorquera H, Perez R, Cipriano A, Acuna G, "Short Term Forecasting of Air Pollution Episodes",

In: Zannetti P (eds) Environmental modeling , WIT Press, UK, 2001.

[3] D Ramesh, B Vishnu Vardhan, “Analysis of Crop Yield Prediction using Data Mining

Techniques”, International Journal of Research in Engineering and Technology, 2015.

Online references:

http://countrystat.psa.gov.ph

http://countrystat.psa.gov.ph/selection.asp

http://ricepedia.org/philippines

Potrebbero piacerti anche