Sei sulla pagina 1di 7

AndreasToprac

3
LinearRegressionProject
Inthepasttwoyears,Ihavemade$4000fromsellingtshirts.Iuseasitecalled
RedBubblethatenablesartistsordesignerstosubmittheirworktobeprintedandsoldonthings
liketshirtsandmugs.Creatingandsellingdesignsonproductshasbecomeanimportantpartof
mylifeforthepasttwoyears,soIdecidedtorunalinearregressiononthenumberoffavoritesI
haveoneachshirtandthenumberofsalesIhaveoneachshirt.
Iuseddatafromatotalof38differentshirts.AftermakingascatterplotwithFavoritesas
theexplanatoryvariableandSalesastheresponsevariable,Idecidedthatitwasappropriateto
runalinearregressionbecausethegraphappearstohaveastrong,linear,positiverelationship.I
chosetomakeFavoritestheexplanatoryvariablebecauseitmakessensethatthemorefavorites
ashirthas,themoresalesitisexpectedtohave.Afterrunningalinearregression,Ifoundthat
theyinterceptwas1.858.Thismeansthatwhenashirthas0favorites,itisexpectedtohave
almost2sales.Theslopewas1.794,whichmeansthatforevery1morefavoriteashirtgets,the
numberofsalesisexpectedtogoupby1.794.Rsquaredwas0.925,whichmeansthat92.5%of
thevarianceinSalesisexplainedbyFavorites.Ris0.962,whichconfirmsthatalinear
regressionwasappropriatebecausethetwovariablesactuallydohaveaverystrong,positive,
linearrelationship.Thebestfittinglinearmodelforthedataisyhat=1.858+1.794x.
Thereareseveraloutliersinboththexandyvariables.The6outliersforFavoritesare
29,31,52,59,84,and105.The4outliersinSalesare119,115,207,and126.Iidentifiedthese
outliersbygeneratingaboxplotforbothvariables.Ithenfoundallofthepointsthatareshown
tobeoutliersintheboxplot.Ifoundthattherewere3influentialpointsinthedataset:(31,28),
(3,41),and(84,126).Afterremovingthesepointsfromthedata,theyinterceptdecreasedto
0.479,theslopeincreasedto1.980,Rsquaredincreasedto0.965,andRincreasedto0.982.This
meansthatthecorrelationbetweenthetwovariablesbecamestrongerwhentheresidualswere
takenout.Theresidualplotofthedatadoesnotappeartohaveaclearpattern.Thismeansthat
alinearmodelprovidesagoodfitforthedata.
Irandomlyselectedthevalue10andpluggeditintomyequation.Thisgaveme
1.858+1.794(10)=19.798.Theactualnumberofsalesfor10favoritesis30.Thismeansthatthe
residualis3019.798=10.202,whichmeansthatmyequationunderestimatedtheactualvalue.
Ithenrananexponentialregressionandalogisticregressionagainstmydata.The
exponentialregressioncameupwithanRsquaredvalueof0.090,whichmeansthatitdoesnot
fitmydataatall.ThelogisticregressioncameupwithanRsquaredvalueof0.916,whichis
veryclosebutstillbelowtomylinearRsquaredvalueof0.925.Thismeansthatalinear
regressionisthebestfitformydata.
Acareerthatwouldusethistypeofanalysisisanycareerthatworksthroughsocial
mediaandadvertising.Forexample,retailbusinesseslikeclothingstores.Myvariableof
Favoritescouldbetranslatedtolikesonapostadvertisingaproductorviewsonacommercial.
Thisdataofviewsorlikescouldthenbeusedinthesamemytopredictthenumberofsalesof
differentproducts.Thiscareerisrelevantbecauseitcanshowhowadvertisingproductscan

AndreasToprac
3
haveanimpactonsales.Thiscouldthenbeconnectedtohowmuchmoneyshouldbespenton
advertisingtoachievethemaximumprofitfromsales.
ThetwovariablesIused,FavoritesandSalesfortshirts,showedastrong,linear,positive
correlation.Thismeansthatgenerallyasthenumberoffavoriteswentup,thenumberofsales
alsowentup.Butthisdoesnotmeanthatthenumberoffavoritescausesthenumberofsales,
becausecorrelationdoesnotnecessarilyequalcausation.

AndreasToprac
3
WorksCited
"ManagePortfolio."
Redbubble
.Redbubble,n.d.Web.17Nov.2015.

AndreasToprac
3

AndreasToprac
3

AndreasToprac
3

AndreasToprac
3
RCode
>linFit(toprac$Favs,toprac$Sales)
Intercept=1.85757
Slope=1.794
Rsquared=0.925
>abline(lm(toprac$Sales~toprac$Favs))
>plot(toprac$Favs,toprac$Sales,main="Favoritesvs.SalesforTshirts",
xlab="NumberofFavorites",ylab="NumberofSales")
>abline(lm(toprac$Sales~toprac$Favs))
>model<lm(toprac$Sales~toprac$Favs)
>plot(resid(model),xlab="NumberofFavorites",ylab="Residuals",main="Residual
PlotforFavoritesvs.Sales")
>boxplot(toprac$Favs,main="BoxplotforFavorites",ylab="Favorites")
>boxplot(toprac$Sales,main="BoxplotforSales",ylab="Sales")
>toprac2<read.csv("~/Toprac,A_Project2Data2.csv")
>View(toprac2)
>linFit(toprac2$Favs,toprac2$Sales)
Intercept=0.47917
Slope=1.97955
Rsquared=0.96477
>expFit(toprac$Favs,toprac$Sales)
a=0.27517
b=1.09736
Rsquared=0.09047
>logisticFit(toprac$Favs,toprac$Sales)
LogisticFit
C=185.1709
a=21.35945
b=1.06323
Rsquared=0.91573

Potrebbero piacerti anche