Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
I. INTRODUCTION
considered as the world’s eighth largest rice producers in the world. It is also one
of the major staple foods for almost all Filipinos across the country. For the
Specifically, every farmer often wants to know or identify how much yield is he
about to expect. Basically, farmers identify their yield based on their previous
experience. When the data from the previous yield becomes information, it will
become very useful for many purposes especially in predicting the future yield.
used extensively to analyze large data sets and establish patterns from these data
sets to be able to transform it into a more understandable structure for future use.
The main objective of this paper is to create platform for Filipino farmers
which will provide the analysis of rice production based on the available data.
Multiple linear regression technique shall be used to predict the crop yield in order
most important crop and where majority of the Filipino farmers rely on.
If the farmers would be able to know their future yield, they would also be
able to maximize their production since, they can have the idea already on how
This study is focused on developing a platform which will analyze the data
on rice production that will predict its future yield using multiple linear regression.
1. Predict the future rice crop yield based on the following input data from year
2003 to 2014:
bags of 50 kilograms
2. Derive a formula from the given input data in predicting the rice crop yield.
3. Predict the future yield using multiple linear regression method through MS
Excel.
The study is focused on predicting the yield of rice crop from the available
national data. The data was gathered through CountryStat Philippines which is a
designed to generate the desired crop information and other agricultural statistical
information.
The yield prediction was based on the input data year, area harvested, area
ammosul, ammophos, and others and the yield. Multiple linear regression method
will be used as the data mining technique in generating the desired output.
V. RELATED LITERATURE
From the research article [1], the researcher highlighted that large amount of data
must be collected and stored for data analysis. Appropriate used of these collected
data will lead to considerable gains for efficiency which will therefore provide
economic advantages.
Several data mining techniques can be used in the field of agriculture. The
researchers were able to implement [2] K-means algorithm for forecasting the
pollution in the atmosphere and different possible changes of the weather scenarios
was used by the researchers [3] for crop yield analysis. Crop yield prediction can be
made with the entire set of existing gathered information and was dedicated to a
VI. METHODOLOGY
Data was gathered in order to predict the future yield of rice crop. The
following table shows the summary of the rice crop production from year 2003-
2014.
Table 1. Summary of rice crop production based on estimated inorganic fertilizer use (Source:
http://countrystat.psa.gov.ph)
From the data gathered shown in Table 1, multiple linear regression was
done through MS Excel where Yield is the dependent variable and the area
harvested, area applied, average quantity applied and the different fertilizers
ANOVA
df SS MS F Significance F
Regression 7 3.01202E+13 4.30289E+12 98.73409 0.000258259
Residual 4 1.74322E+11 43580601198
Total 11 3.02946E+13
CoefficientsStandard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -20239830 5106173.3 -4.0 0.0 -34416840.1 -6062820.6 -34416840.1 -6062820.6
AREA HARVESTED 7.3 2.1 3.4 0.0 1.4 13.3 1.4 13.3
AREA APPLIED -0.7 1.5 -0.4 0.7 -4.8 3.5 -4.8 3.5
AVERAGE QUANTITY747599.2
APPLIED 1117653.7 0.7 0.5 -2355505.0 3850703.4 -2355505.0 3850703.4
UREA -500267.4 2980755.3 -0.2 0.9 -8776170.9 7775636.0 -8776170.9 7775636.0
AMMOSUL 7383185.7 3826488.2 1.9 0.1 -3240848.8 18007220.2 -3240848.8 18007220.2
AMMOPHOS 851303.4 2074780.8 0.4 0.7 -4909211.8 6611818.5 -4909211.8 6611818.5
OTHERS 9497778.4 4707094.8 2.0 0.1 -3571211.9 22566768.8 -3571211.9 22566768.8
For a more reliable result in predicting the rice crop yield, those variables
which has a P-value that has 0.15 or greater was removed. It was said that having
these values in the P-value doesn’t matter in predicting the outcomes and therefore
variables now are the Area Harvested, ammosul and the other kind of fertilizers
applied and the rest of the independent variables are excluded for the next
analysis.
YEAR AREA HARVESTED AMMOSUL OTHERS YIELD
2003 4,006,421.00 0.43 0.09 13,499,884.00
2004 4,126,645.00 0.44 0.08 14,496,784.00
2005 4,070,421.00 0.44 0.1 14,603,005.00
2006 4,159,930.00 0.47 0.09 15,326,706.00
2007 4,272,889.00 0.52 0.1 16,240,194.00
2008 4,459,977.00 0.51 0.08 16,815,548.00
2009 4,532,310.00 0.46 0 16,266,417.00
2010 4,354,161.00 0.51 0 15,772,319.00
2011 4,536,642.00 0.51 0 16,684,062.00
2012 4,689,960.00 0.52 0 18,032,525.47
2013 4,746,082.00 0.54 0 18,439,419.73
2014 4,739,672.16 0.57 0 18,967,826.17
SUMMARY OUTPUT After the second analysis, the following results were obtained.
Regression Statistics
Multiple R 0.99220041
R Square 0.984461654
Adjusted R Square 0.978634774
Standard Error 242571.4658
Observations 12
ANOVA
df SS MS F Significance F
Regression 3 2.98238E+13 9.94128E+12 168.9517706 1.42562E-07
Residual 8 4.70727E+11 58840915999
Total 11 3.02946E+13
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept (i) -14860650.34 2560071.49 -5.804779438 0.000402962 -20764185.78 -8957114.9 -20764185.78 -8957114.9
AREA HARVESTED (x1) 5.745114852 0.781755906 7.348988102 8.00168E-05 3.9423825 7.547847203 3.9423825 7.547847203
AMMOSUL (x2) 11162549.45 3167519.183 3.524066882 0.007799368 3858237.115 18466861.78 3858237.115 18466861.78
OTHERS (x3) 8612484.726 2993311.343 2.877243207 0.020598532 1709896.392 15515073.06 1709896.392 15515073.06
technique that uses several variables to predict the desired outcome of a response
variable. From the generated results shown in Figure 2, the following formula for
predicting the rice crop yield was derived using the principle of multiple linear
y=i+x1*xi2+x2*xi2+x3*xi3
where y is the yield prediction, x1, x2, x3 are the independent variables and xi1,
xi2, xi3 are the coefficient values of x1, x2, and x3.
The same data was used to predict the yield of rice crop. An approximate
95% prediction interval was set (PI). After calculation, yield prediction can be
In this paper, rice crop yield analysis was processed through the
implementation of multiple linear regression. The exact value along with the
predicted value from year 2003 to 2014 is shown in Table 4. The estimated results
Area
Ammosul Others Yield Percentage
YEAR Harvested Actual Yield
(xi2) (xi3) Prediction (y) of Difference
(xi1)
2003 4,006,421.00 0.43 0.09 13,731,718.34 13,499,884.00 1.72
2004 4,126,645.00 0.44 0.08 14,447,919.67 14,496,784.00 (0.34)
2005 4,070,421.00 0.44 0.1 14,297,156.03 14,603,005.00 (2.09)
2006 4,159,930.00 0.47 0.09 15,060,147.15 15,326,706.00 (1.74)
2007 4,272,889.00 0.52 0.1 16,353,361.90 16,240,194.00 0.70
2008 4,459,977.00 0.51 0.08 17,144,328.76 16,815,548.00 1.96
2009 4,532,310.00 0.46 0 16,312,763.90 16,266,417.00 0.28
2010 4,354,161.00 0.51 0 15,847,404.90 15,772,319.00 0.48
2011 4,536,642.00 0.51 0 16,895,779.21 16,684,062.00 1.27
2012 4,689,960.00 0.52 0 17,888,234.22 18,032,525.47 (0.80)
2013 4,746,082.00 0.54 0 18,433,912.55 18,439,419.73 (0.03)
2014 4,739,672.16 0.57 0 18,731,963.76 18,967,826.17 (1.24)
Table 4. Exact yield and predicted yield using multiple linear regression.
VIII. CONCLUSION
The statistical model multiple linear regression was used on the existing and
available data. Data that was gathered was based on the total rice crop field in all
regions in the Philippines. After several procedures that has been done, it can be
concluded that the use of multiple linear regression has able to establish a
relationship between the set dependent and independent variables that will take
On the other hand, this work can be extended through considering other
factors like the weather condition and the amount of rainfall that might occur in the
place and the use of other statistical tools and other data mining techniques.
REFERENCES
[1] G Ruß, "Data Mining of Agricultural Yield Data : A Comparison of Regression Models",
Perner (Ed.), Lecture Notes in Artificial Intelligence 6171, Berlin, Heidelberg, Springer, 2009.
[2] Jorquera H, Perez R, Cipriano A, Acuna G, "Short Term Forecasting of Air Pollution Episodes",
[3] D Ramesh, B Vishnu Vardhan, “Analysis of Crop Yield Prediction using Data Mining
Online references:
http://countrystat.psa.gov.ph
http://countrystat.psa.gov.ph/selection.asp
http://ricepedia.org/philippines