Sei sulla pagina 1di 16

12/24/2015

(9)HowcanIbecomeadatascientist?Quora
AskQuestion

howtobecomedata

JobsandCareersinDataScience

DataScience

BigData

Read

CareerAdvice

Answer

Notifications

Pallav

QuestionOverview

HowcanIbecomeadatascientist?
WriteAnswer

ReAsk

Unfollow 8.2k

Comment Share Downvote

Youfollowedthis
We'llnotifyyouaboutthebestnewanswers.YoucanalsohelpQuorafindsomeonetoanswer:
AsktoAnswer

100+Answers

8,259FollowersincludingYuvalFeinstein,
AlgorithmicSoftwareEngineerinNLP,IRand
MachineLearningAbhishekKuntalKeerti
YuvalFeinstein
AgrawalRyanFoxSquire
AlgorithmicSoftwareEngineerin
NLP,IRandMachineLearning
InFAQforDataScience,JobsandCareersin
DataScience,andBecomingaDataScientist

25,228
1,558,402Views
30DayViews
ViewMore

372,958
AllTimeViews

MostViewedinLucene,Information

WilliamChen,DataScientistatQuora
631.1kViewsUpvotedbyRyanFoxSquire,NeuroscientistTurnedDataScientist
WilliamisaMostViewedWriterinDataScience.

Herearesomeamazingandcompletelyfreeresourcesonlinethatyoucan
usetoteachyourselfdatascience.
Besidesthispage,IwouldhighlyrecommendtheOfficialQuoraDataScienceFAQasyour
comprehensiveguidetodatascience!Itincludesresourcessimilartothisone,aswellas
adviceonpreparingfordatascienceinterviews.Additionally,followtheQuoraData
Sciencetopicifyouhaven'talreadytogetupdatesonnewquestionsandanswers!

Fulfillyourprerequisites
Beforeyoubegin,youneedMultivariableCalculus,LinearAlgebra,andPython.Ifyour
mathbackgroundisuptomultivariablecalculusandlinearalgebra,you'llhaveenough
backgroundtounderstandalmostalloftheprobability/statistics/machinelearningfor
thejob.
MultivariateCalculus:Whatarethebestresourcesformasteringmultivariable
calculus?
NumericalLinearAlgebra/ComputationalLinearAlgebra/MatrixAlgebra:
LinearAlgebra ,Coursera (starts2/2/2015)
Multivariatecalculusisusefulforsomepartsofmachinelearningandalotofprobability.
Linear/Matrixalgebraisabsolutelynecessaryforalotofconceptsinmachinelearning.

Retrieval,Elasticsearch,and13more
RelatedQuestions
FollowedbyVladimirNovakovski
Arethereanyprofessionalcoursestobecomeadata
scientist?
Follow 725
Subscribe
WhyshouldIbecomeadatascientist?
CanIbecomeadatascientist?
HowdoIselfstudytobecomeadatascientistin2
months?
Doesadatasciencecoursereallyhelpyouto
becomeadatascientist?
DoIneedadegreetobecomeadatascientist?
BusinessIntelligence:HowshouldIbecomeadata
scientistasaBIprofessional?
Whatarethequalificationstobecomeadata
scientist?
HowcanIbecomeanefficientdatascientistin5
years?
HowdoIbecomeadatascientistinIndia?
MoreRelatedQuestions

Youalsoneedsomeprogrammingbackgroundtobegin,preferablyinPython.Mostother
thingsonthisguidecanbelearnedonthejob(likerandomforests,pandas,A/Btesting),
butyoucan'tgetawaywithoutknowinghowtoprogram!
Pythonisthemostimportantlanguageforadatascientisttolearn.Tolearnto
code,moreaboutPython,andwhyPythonissoimportant,checkout
HowdoIlearntocode?
HowdoIlearnPython?
WhyisPythonalanguageofchoicefordatascientists?
IsPythonthemostimportantprogramminglanguagetolearnforaspiringdata
scientists&dataminers?

Ifyou'recurrentlyinschool,takestatisticsandcomputerscienceclasses.Check
outWhatclassesshouldItakeifIwanttobecomeadatascientist?

PlugYourselfIntotheCommunity
CheckoutMeetup tofindsomethatinterestyou!Attendaninterestingtalk,learnabout
datasciencelive,andmeetdatascientistsandotheraspirationaldatascientists.Start
readingdatascienceblogsandfollowinginfluentialdatascientists:
Whatarethebestblogsaboutdata?
Whatisyoursourceofmachinelearninganddatasciencenews?Why?

https://www.quora.com/HowcanIbecomeadatascientist1

1/16

12/24/2015

(9)HowcanIbecomeadatascientist?Quora

DataScience:whataresomebestusers/agenciestofollowonTwitter,Facebook,
G+,andLinkedIn?
WhatarethebestTwitteraccountsaboutdata?

SetupandLearntouseyourtools
Python
InstallPython,iPython,andrelatedlibraries(guide )
HowdoIlearnPython?
R
InstallR andRStudio (IwouldsaythatRisthesecondmostimportant
language.It'sgoodtoknowbothPythonandR)
LearnRwithswirl

SublimeText
InstallSublimeText
What'sthebestwaytolearntouseSublimeText?
SQL
HowdoIlearnSQL?(YoucanpracticeitusingthesqlitepackageinPython)

LearnProbabilityandStatistics
BesuretogothroughacoursethatinvolvesheavyapplicationinRorPython.Knowing
probabilityandstatisticswillonlyreallybehelpfulifyoucanimplementwhatyoulearn.
PythonApplication:ThinkStats (freepdf )(Pythonfocus)
RApplications:AnIntroductiontoStatisticalLearning (freepdf )(MOOC )
(Rfocus)
PrintoutacopyofProbabilityCheatsheet

CompleteHarvard'sDataScienceCourse
AsofFall2015,thecourseiscurrentlyinitsthirdyearandstrivestobeasapplicableand
helpfulaspossibleforstudentswhoareinterestedinbecomingdatascientists.Anexample
ofhowisthishappeningistheintroductionofSparkandSQLstartingthisyear.
ThiscourseisdevelopedinpartbyafellowQuorauser,ProfessorJoeBlitzstein.Hereare
allofthematerials!
Intrototheclass
Whatisitliketodesignadatascienceclass?
WhatisitliketotakeCS109/Statistics121(DataScience)atHarvard?

CourseMaterials
Classmainpage:CS109DataScience
Lectures,Slides,andLabs:ClassMaterial

Assignments
IntrotoPython,Numpy,Matplotlib(Homework0 )(Solutions )
PollAggregation,WebScraping,Plotting,ModelEvaluation,and
Forecasting(Homework1 )(Solutions )
DataPrediction,Manipulation,andEvaluation(Homework2 )
(Solutions )
PredictiveModeling,ModelCalibration,SentimentAnalysis
(Homework3 )(Solutions )
RecommendationEngines,UsingMapreduce(Homework4 )
(Solutions )
NetworkVisualizationandAnalysis(Homework5 )(Solutions )

https://www.quora.com/HowcanIbecomeadatascientist1

2/16

12/24/2015

(9)HowcanIbecomeadatascientist?Quora

Labs
(thesearethe2013labs.Forthe2015labs,checkoutClassMaterial )
Lab2:WebScraping
Lab3:EDA,Pandas,Matplotlib
Lab4:ScikitLearn,Regression,PCA
Lab5:Bias,Variance,CrossValidation
Lab6:Bayes,LinearRegression,andMetropolisSampling
Lab7:GibbsSampling
Lab8:MapReduce
Lab9:Networks
Lab10:SupportVectorMachines

DomostofKaggle'sGettingStartedandPlaygroundCompetitions
IwouldNOTrecommenddoinganyoftheprizemoneycompetitions.Theyusuallyhave
datasetsthataretoolarge,complicated,orannoying,andarenotgoodforlearning
(Kaggle.com )
Startbylearningscikitlearn,playingaround,readingthroughtutorialsandforumsat
DataScienceLondon+Scikitlearn forasimple,synthetic,binaryclassificationtask.
Next,playaroundsomemoreandcheckoutthetutorialsforTitanic:MachineLearning
fromDisaster withaslightlymorecomplicatedbinaryclassificationtask(with
categoricalvariables,missingvalues,etc.)
Afterwards,trysomemulticlassclassificationwithForestCoverTypePrediction .
Now,tryaregressiontaskBikeSharingDemand thatinvolvesincorporating
timestamps.TryoutsomenaturallanguageprocessingwithSentimentAnalysison
MovieReviews .Finally,tryoutanyoftheotherknowledgebasedcompetitionsthat
interestyou!

LearnSomeDataScienceElectives
ProductMetricswillteachyouaboutwhatcompaniestrack,whatmetricsthey
findimportant,andhowcompaniesmeasuretheirsuccess:The27Metricsin
PinterestsInternalGrowthDashboard
Optimizationwillhelpyouwithunderstandingstatisticsandmachinelearning:
ConvexOptimizationBoydandVandenberghe
A/BTestingisjustarebrandedversionofwhatpharmaceuticalcompanieshave
beendoingfordecades.LearnmoreaboutA/Btestinghere:HowdoIlearn
aboutA/Btesting?
VisualizationIwouldrecommendpickingupggplot2inRtomakesimpleyet
beautifulgraphicsandjustbrowsingDataIsBeautiful/r/dataisbeautiful and
FlowingData forideasandinspiration.
UserBehaviorThissetofblogspostslooksusefulandinterestingThis
ExplainsEverything"UserBehavior
FeatureEngineeringCheckoutMLconf2015Seattle:Whataresomebestpracticesin
FeatureEngineering?andthisgreatexample:http://nbviewer.ipython.org/gith...
BigDataTechnologiesThesearetoolsandframeworksdevelopedspecificallytodeal
withmassiveamountsofdata.HowdoIlearnbigdatatechnologies?
MachineLearningHowdoIlearnmachinelearning?Thisisanextremelyrich
areawithmassiveamountsofpotential.AndrewNg'sMachineLearningcourseon
CourseraisoneofthemostpopularMOOCs,andagreatwaytostart!AndrewNg's
MachineLearningMOOC
NaturalLanguageProcessingThisisthepracticeofturningtextdataintonumerical
datawhilststillpreservingthe"meaning".Learningthiswillletyouanalyzenew,exciting
formsofdata.HowdoIlearnNaturalLanguageProcessing(NLP)?
TimeSeriesAnalysisHowdoIlearnabouttimeseriesanalysis?
BuildingaDataCulturehttp://www.oreilly.com/data/free...

https://www.quora.com/HowcanIbecomeadatascientist1

3/16

12/24/2015

(9)HowcanIbecomeadatascientist?Quora

DoaCapstoneProduct/SideProject
Useyournewdatascienceandsoftwareengineeringskillstobuildsomethingthatwill
makeotherpeoplesaywow!Thiscanbeawebsite,newwayoflookingatadataset,cool
visualization,oranything!
DataScience:Whataresomegoodtoyproblemsindatascience?
RecommendationSystems:HowcanIstartbuildingarecommendationengine?
WhataresomeideasforaquickweekendPythonproject?
WhatisagoodmeasureoftheinfluenceofaTwitteruser?
WherecanIfindlargedatasetsopentothepublic?
Whataresomegoodalgorithmsforaprioritizedinbox?
Whataresomegooddatascienceprojects?
Createpublicgithubrepositories,makeablog,andpostyourwork,sideprojects,Kaggle
solutions,insights,andthoughts!Thishelpsyougainvisibility,buildaportfolioforyour
resume,andconnectwithotherpeopleworkingonthesametasks.

GetaDataScienceInternshiporJob
HowdoIprepareforadatascientistinterview?
HowshouldIprepareforstatisticsquestionsforadatascienceinterview?
WhatkindofA/BtestingquestionsshouldIexpectinadatascientistinterview
andhowshouldIprepareforsuchquestions?
Whatcompanieshavedatascienceinternships?
WhataresometipstochoosewhetherIwanttoapplyforaDataScienceor
SoftwareEngineeringinternship?
Whenisthebesttimetoapplyfordatasciencesummerinternships?

CheckoutTheOfficialQuoraDataScienceFAQ formorediscussiononinternships,jobs,
anddatascienceinterviewprocesses!ThedatascienceFAQalsolinkstomorespecific
versionsofthisquestion,likeHowdoIbecomeadatascientistwithoutaPhD?orthe
counterpart,HowdoIbecomeadatascientistasaPhDstudent?

ThinklikeaDataScientist
InadditiontotheconcretestepsIlistedabovetodeveloptheskillsetofadatascientist,I
includesevenchallengesbelowsoyoucanlearntothinklikeadatascientistand
developtherightattitudetobecomeone.

(1)Satiateyourcuriositythroughdata
Asadatascientistyouwriteyourownquestionsandanswers.Datascientists
arenaturallycuriousaboutthedatathatthey'relookingat,andarecreativewithwaysto
approachandsolvewhateverproblemneedstobesolved.
Muchofdatascienceisnottheanalysisitself,butdiscoveringaninteresting
questionandfiguringouthowtoanswerit.
Herearetwogreatexamples:
Hilary:themostpoisonedbabynameinUShistory
ALookatFireResponseData

Challenge:Thinkofaproblemortopicyou'reinterestedinandansweritwithdata!

(2)Readnewswithaskepticaleye
Muchofthecontributionofadatascientist(andwhyit'sreallyhardtoreplaceadata
scientistwithamachine),isthatadatascientistwilltellyouwhat'simportantandwhat's
spurious.Thispersistentskepticismishealthyinallsciences,andisespeciallynecessarilyin

https://www.quora.com/HowcanIbecomeadatascientist1

4/16

12/24/2015

(9)HowcanIbecomeadatascientist?Quora

afastpacedenvironmentwhereit'stooeasytoletaspuriousresultbemisinterpreted.
Youcanadoptthismindsetyourselfbyreadingnewswithacriticaleye.Manynews
articleshaveinherentlyflawedmainpremises.Trythesetwoarticles.Sample
answersareavailableinthecomments.
Easier:YouLoveYouriPhone.Literally.
Harder:WhopredictedRussiasmilitaryintervention?
Challenge:Dothiseverydaywhenyouencounteranewsarticle.Commentonthearticle
andpointouttheflaws.

(3)Seedataasatooltoimproveconsumerproducts
Visitaconsumerinternetproduct(probablythatyouknowdoesn'tdoextensiveA/B
testingalready),andthenthinkabouttheirmainfunnel.Dotheyhaveacheckoutfunnel?
Dotheyhaveasignupfunnel?Dotheyhaveavirilitymechanism?Dotheyhavean
engagementfunnel?
Gothroughthefunnelmultipletimesandhypothesizeaboutdifferentwaysitcoulddo
bettertoincreaseacoremetric(conversionrate,shares,signups,etc.).Designan
experimenttoverifyifyoursuggestedchangecanactuallychangethecoremetric.
Challenge:Shareitwiththefeedbackemailfortheconsumerinternetsite!

(4)ThinklikeaBayesian
TothinklikeaBayesian,avoidtheBaseratefallacy .Thismeanstoformnewbeliefsyou
mustincorporatebothnewlyobservedinformationANDpriorinformationformedthrough
intuitionandexperience.
Checkingyourdashboard,userengagementnumbersaresignificantlydown
today.Whichofthefollowingismostlikely?
1.Usersaresuddenlylessengaged
2.Featureofsitebroke
3.Loggingfeaturebroke
Eventhoughexplanation#1completelyexplainsthedrop,#2and#3shouldbemore
likelybecausetheyhaveamuchhigherpriorprobability.
You'reinseniormanagementatTesla,andfiveofTesla'sModelS'shave
caughtfireinthelastfivemonths.Whichismorelikely?
1.ManufacturingqualityhasdecreasedandTeslasshouldnowbedeemedunsafe.
2.SafetyhasnotchangedandfiresinTeslaModelS'sarestillmuchrarerthantheir
counterpartsingasolinecars.
While#1isaneasyexplanation(andgreatformediacoverage),yourpriorshouldbe
strongon#2becauseofyourregularqualitytesting.However,youshouldstillbeseeking
informationthatcanupdateyourbeliefson#1versus#2(andstillfindwaystoimprove
safety).Questionforthought:whatinformationshouldyouseek?
Challenge:IdentifythelasttimeyoucommittedtheBaseRateFallacy.Avoidcommitting
thefallacyfromnowon.

(5)Knowthelimitationsofyourtools
Knowledgeisknowingthatatomatoisafruit,wisdomisnotputtingitinafruitsalad.
MilesKington
Knowledgeisknowinghowtoperformaordinarylinearregression,wisdomisrealizing
howrareitappliescleanlyinpractice.
KnowledgeisknowingfivedifferentvariationsofKmeansclustering,wisdomisrealizing

https://www.quora.com/HowcanIbecomeadatascientist1

5/16

12/24/2015

(9)HowcanIbecomeadatascientist?Quora

howrarelyactualdatacanbecleanlyclustered,andhowpoorlyKmeansclusteringcan
workwithtoomanyfeatures.
Knowledgeisknowingavastrangeofsophisticatedtechniques,butwisdomisbeingableto
choosetheonethatwillprovidethemostamountofimpactforthecompanyina
reasonableamountoftime.
YoumaydevelopavastrangeoftoolswhileyougothroughyourCourseraorEdXcourses,
butyourtoolboxisnotusefuluntilyouknowwhichtoolstouse.
Challenge:Applyseveraltoolstoarealdatasetanddiscoverthetradeoffsandlimitations
ofeachtools.Whichtoolsworkedbest,andcanyoufigureoutwhy?

(6)Teachacomplicatedconcept
HowdoesRichardFeynmandistinguishwhichconceptsheunderstandsandwhich
conceptshedoesn't?
Feynmanwasatrulygreatteacher.Hepridedhimselfonbeingabletodevisewaysto
explaineventhemostprofoundideastobeginningstudents.Once,Isaidtohim,"Dick,
explaintome,sothatIcanunderstandit,whyspinonehalfparticlesobeyFermiDirac
statistics."Sizinguphisaudienceperfectly,Feynmansaid,"I'llprepareafreshman
lectureonit."Buthecamebackafewdayslatertosay,"Icouldn'tdoit.Icouldn't
reduceittothefreshmanlevel.Thatmeanswedon'treallyunderstandit."DavidL.
Goodstein,Feynman'sLostLecture:TheMotionofPlanetsAroundtheSun
WhatdistinguishedRichardFeynmanwashisabilitytodistillcomplexconceptsinto
comprehendibleideas.Similarly,whatdistinguishestopdatascientistsistheirabilityto
cogentlysharetheirideasandexplaintheiranalyses.
Checkouthttps://www.quora.com/EdwinChen...forexamplesofcogentlyexplained
technicalconcepts.
Challenge:Teachatechnicalconcepttoafriendoronapublicforum,likeQuoraor
YouTube.

(7)Convinceothersaboutwhat'simportant
Perhapsevenmoreimportantthanadatascientist'sabilitytoexplaintheiranalysisistheir
abilitytocommunicatethevalueandpotentialimpactoftheactionable
insights.
Certaintasksofdatasciencewillbecommoditizedasdatasciencetools
becomebetterandbetter.Newtoolswillmakeobsoletecertaintaskssuchaswriting
dashboards,unnecessarydatawrangling,andevenspecifickindsofpredictivemodeling.
However,theneedforadatascientisttoextractoutandcommunicatewhat's
importantwillneverbemadeobsolete.Withincreasingamountsofdataandpotential
insights,companieswillalwaysneeddatascientists(orpeopleindatasciencelikeroles),
totriageallthatcanbedoneandprioritizetasksbasedonimpact.
Thedatascientist'sroleinthecompanyistheserveastheambassadorbetweenthe
dataandthecompany.Thesuccessofadatascientistismeasuredbyhowwellhe/she
cantellastoryandmakeanimpact.Everyotherskillisamplifiedbythisability.
Challenge:Tellastorywithstatistics.Communicatetheimportantfindingsinadataset.
Makeaconvincingpresentationthatyouraudiencecaresabout.
Ifyoulikedthisanswer,pleaseconsider:
1. FollowingtheOfficialQuoraDataScienceFAQ
2. Followingme(WilliamChen)andmyQuorablogatStorytellingwithStatisticsto
getnotifiedwhenIpostmorecontentlikethis!
3. Sharingthispostwithyourfriends!
UpdatedNov27ViewUpvotes

https://www.quora.com/HowcanIbecomeadatascientist1

6/16

12/24/2015

(9)HowcanIbecomeadatascientist?Quora

Upvote 6.1k

Downvote Comments 58+

Share 97

PronojitSaha,DataAficionado.
97kViewsUpvotedbyRyanFoxSquire,NeuroscientistTurnedDataScientist
PronojitisaMostViewedWriterinJobsandCareersinDataScience.
OriginallyAnswered:HowdoIbecomeadatascientist?

SELFSTARTERWAY
Foraselfstarternovice,hereisanoutlinethatonecanstartwith.(thisisreproducedfrom
myblogHowtoacquirethe"EssentialSkillSet"?theSelfStarterway).
0.BasicPrerequisites:
Mathematics,Algorithms&Databases:Mathispower4uCalculus ,Coursera
LinearAlgebra ,CourseraAnalysisofAlgorithms ,CourseraIntroductionto
Databases
Statistics:ProbabilityandStatisticsforProgrammers ,StatisticalFormulasFor
Programmers ,CourseraDataAnalysis ,CourseraStatisticsOne
Programming:GoogleDevelopersRProgrammingLectures ,IntroductiontoR
DataCamp ,ScientificPythonLectures ,HowtoThinkLikeaComputer
Scientist

1.Acquire&ScrubData:
DFS&Databases:HadoopTutorialYahoo ,AMPCampBerkeleySpark
Introduction&Exercises ,IntrotoHadoop&MapReduceforBeginners
Udacity ,BigDataUniversity:BigData ,Alloutbeginner'sguidetoMongoDB
DataMunging:PredictiveAnalytics:DataPreparation ,DataWranglingin
Pandas ,AnalyzingandManipulatingDatawithPandas ,DataWrangler ,
OpenRefine

2.Filter&Minedata:
DataAnalysisinR:DatascienceinR ,CourseraComputingforDataAnalysisin
R
DataAnalysisinPython(numpy,scipy,pandas,scikit):GettingStartedWith
PythonForDataScience ,IntroductiontoNumPySciPyConf2015 ,Statistical
DataAnalysisinPython,Pandas (1stVideoBelow),SciPy2013Introductionto
SciKitLearnTutorialI&II (2nd&3rdVideoBelow)

https://www.quora.com/HowcanIbecomeadatascientist1

7/16

12/24/2015

(9)HowcanIbecomeadatascientist?Quora

ExploratoryDataAnalysisExploratoryDataAnalysisinR ,ExploratoryData
AnalysisinPython ,UCBerkeley:DescriptiveStatistics ,BasicUnixShell
CommandsfortheDataScientist
DataMining,MachineLearning:

DataMiningMap ,CourseraMachineLearning ,StanfordStatistical


Learning ,MITx:TheAnalyticsEdge ,STATS202DataMining&Analysis ,
MiningMassiveDataSetsStanford ,LearningFromDataCalTech ,
CourseraWebIntelligence&BigData

3.Represent&RefineData:TableauTraining&Tutorials ,DatavisualisationinRwith
ggplot2andplyr ,PredictiveAnalytics:OverviewandDatavisualization ,FlowingData
Tutorials ,UCBerkeleyDataVisualization ,D3.jsTutorial
4.DomainKnowledge:Thisskillisdevelopedthroughexperienceworkinginanindustry.
Eachdatasetisdifferentandcomeswithcertainassumptionsandindustryknowledge.For
example,adataanalystspecializinginstockmarketdatawouldneedtimetodevelop
knowledgeinanalyzingtransactionaldataforrestaurants.
Combiningalltheabove:
DataLiteracyCourseIAP
CourseraIntroductiontoDataScience
CourseraDataScienceSpecialization
Books:
ElementsofStatisticalLearning
PythonMachineLearning
Applytheknowledge:
HarvardDataScienceCourseHomework
Kaggle:TheHomeofDataScience
AnalyzingBigDatawithTwitter
AnalyzingTwitterDatawithApacheHadoop

https://www.quora.com/HowcanIbecomeadatascientist1

8/16

12/24/2015

(9)HowcanIbecomeadatascientist?Quora

FORMALWAY
Foramoreformalwayofbecomingadatascientistonecanlookintothispost
(reproducedbelow)Howtoacquirethe"EssentialSkillSet"?theFormalway.
TheEssentialSkillSetarethebasicfundamentalskillswhicheverydatascientistis
expectedtoknow.Traditionally,thesecanbeacquiredbyundertakingacomputerscience
degreeorastatisticsdegreefromaninstitution.TheStanfordComputerSciencecourses
&Statisticscourses provideagoodreferencelistofcoursestoundertake.Nowsomeof
thecoursesarerelevantwhilemanyothersarenot.ForexampleinComputerScience
whileonewoulddogoodtolearnaboutlargescaledistributeddatabases&algorithmsbut
thereisnoneedforlearningHCIandUX,orpureplaystorageandoperatingsystems,
networking,etc.Similarlysomestatisticscoursesfocustoomuchon,letssay,"oldschool
statistics"includingthousandsofwaysofhypothesistestinginsteadofmoreonmachine
learning(clustering,regression,classification,etc).Soboththestreamshavemanyniceto
havecoursesandmusthavecoursesforadatascientist(Idaretoclaimthatatpresentthe
percentageofmusthavecoursesseemstobegreaterinatraditionalStatisticsstreamthan
aComputerSciencestream).Assuchoneneedstopickthecourseswisely.
Oralternatively,onecanalsolookintoanumberofnewDataSciencecoursesthatsome
universitiesareofferingharpingonthepointsImentionedabove.Theycombinethemust
havecoursesfromboththetraditionalstatisticsandcomputerscienceprogramtoimpart
the4EssentialSkillsaswellasincludecoursestodeveloptheDifferentiatorSkillsin
students.TheMSinDataScienceatNYU &MSinAnalyticsatUSF aregoodexamples
ofsuchamalgamationoftherequisitecourses.Acompletelistofsuchcoursesispresented
hereCollegeswithDataScienceDegrees .
Thecorrectprogramobviouslydependsontheindividual'sgoal.OneoftherecentO'Rielly
publicationstitled'AnalyzingtheAnalyzers'doesaverygoodjobinaggregatingthe
variousdatascientistrolesinto4maincategoriesaspertheirskills.Anindividualmay
thereforeselectaprogramasperthecategoryofdatascientisthemostidentifieshimself
with,asshownbelow.
DataBusinesspeoplearetheproductandprofitfocuseddatascientists.
They'releaders,managers,andentrepreneurs,butwithatechnicalbent.A
commoneducationalpathisanengineeringdegreepairedwithanMBAorthe
newDataScienceprogramsasmentionedabove.
DataCreativesareeclecticjacksofalltrades,abletoworkwithabroadrange
ofdataandtools.Theymaythinkofthemselvesasartistsorhackers,andexcelat
visualizationandopensourcetechnologies.Theyareexpectedtohavea
engineeringdegree(mostlyinstatisticsoreconomics)butnotmuchinbusiness
skills.
DataDevelopersarefocusedonwritingsoftwaretodoanalytic,statistical,and
machinelearningtasks,ofteninproductionenvironments.Theyoftenhave
computersciencedegrees,andoftenworkwithsocalled"bigdata".
DataResearchersapplytheirscientifictraining,andthetoolsandtechniques
theylearnedinacademia,toorganizationaldata.TheymayhaveaMSorPhDsin
statistics,economic,physics,etc.,andtheircreativeapplicationsofmathematical
toolsyieldsvaluableinsightsandproducts.
Theskillsassociatedwiththe4maincategories,whichjustifytheabovementioned
programrecommendation,areasbelow:

https://www.quora.com/HowcanIbecomeadatascientist1

9/16

12/24/2015

(9)HowcanIbecomeadatascientist?Quora

UpdatedNov8ViewUpvotes
Upvote 1k

Downvote Comments 13+

Share 35

AlexKamil
260.1kViewsUpvotedbyRyanFoxSquire,NeuroscientistTurnedDataScientistRobert
Chang,DataJanitor@Twitter|TaiwaneseAmerican|Statisticallyeducated|Aspiringsinger
JackGolding,AustralianDataEngineerMarcBodnick
AnswerfeaturedinForbes.
OriginallyAnswered:HowdoIbecomeadatascientist?

Strictlyspeaking,thereisnosuchthingas"datascience"(seeWhatisdatascience?).See
also:Vardi,Sciencehasonlytwolegs:http://portal.acm.org/ft_gateway...
HerearesomeresourcesI'vecollectedaboutworkingwithdata,Ihopeyoufindthem
useful(note:I'manundergradstudent,thisisnotanexpertopinioninanyway).
1)Learnaboutmatrixfactorizations
TaketheComputationalLinearAlgebracourse(itissometimescalledApplied
LinearAlgebraorMatrixComputationsorNumericalAnalysisorMatrixAnalysis
anditcanbeeitherCSorAppliedMathcourse).Matrixdecomposition
algorithmsarefundamentaltomanydataminingapplicationsandareusually
underrepresentedinastandard"machinelearning"curriculum.WithTBsofdata
traditionaltoolssuchasMatlabbecomenotsuitableforthejob,youcannotjust
runeig()onBigData.Distributedmatrixcomputationpackagessuchasthose
includedinApacheMahout[1]aretryingtofillthisvoidbutyouneedto
understandhowthenumericalgorithms/LAPACK/BLASroutines[2][3][4][5]
workinordertousethemproperly,adjustforspecialcases,buildyourownand
scalethemuptoterabytesofdataonaclusterofcommoditymachines.[6]
Usuallynumericscoursesarebuiltuponundergraduatealgebraandcalculusso
youshouldbegoodwithprerequisites.I'drecommendtheseresourcesforself
study/referencematerial:
SeeJackDongarra:Courses andWhataresomegoodresourcesforlearning
aboutnumericalanalysis?

2)Learnaboutdistributedcomputing
ItisimportanttolearnhowtoworkwithaLinuxclusterandhowtodesign
scalabledistributedalgorithmsifyouwanttoworkwithbigdata(Whythe
currentobsessionwithbigdata?).
CraysandConnectionMachinesofthepastcannowbereplacedwithfarmsof
cheapcloudinstances,thecomputingcostsdroppedtolessthan$1.80/GFlopin
2011vs$15Min1984:http://en.wikipedia.org/wiki/FLOPS .
Ifyouwanttosqueezethemostoutofyour(rented)hardwareitisalsobecoming

https://www.quora.com/HowcanIbecomeadatascientist1

10/16

12/24/2015

(9)HowcanIbecomeadatascientist?Quora

increasinglyimportanttobeabletoutilizethefullpowerofmulticore(see
http://en.wikipedia.org/wiki/Moo... )
Note:thistopicisnotpartofastandardMachineLearningtrackbutyoucan
probablyfindcoursessuchasDistributedSystemsorParallelProgrammingin
yourCS/EEcatalog.Seedistributedcomputingresources,asystemscourseat
UIUC ,keyworks,andforstarters:IntroductiontoComputerNetworking .
Afterstudyingthebasicsofnetworkinganddistributedsystems,I'dfocuson
distributeddatabases,whichwillsoonbecomeubiquitouswiththedatadeluge
andhittingthelimitsofverticalscaling.Seekeyworks,researchtrendsandfor
starters:Introductiontorelationaldatabases andIntroductiontodistributed
databases(HBaseinAction) .

3)Learnaboutstatisticalanalysis
StartlearningstatisticsbycodingwithR:WhatareessentialreferencesforR?
andexperimentwithrealworlddata:WherecanIfindlargedatasetsopentothe
public?
CosmaShalizi compiledsomegreatmaterialsoncomputationalstatistics,check
outhislectureslides,andalsoWhataresomegoodresourcesforlearningabout
statisticalanalysis?
I'vefoundthatlearningstatisticsinaparticulardomain(e.g.NaturalLanguage
Processing)ismuchmoreenjoyablethantakingStats101.Mypersonal
recommendationisthecoursebyMichaelCollins atColumbia(alsoavailableon
Coursera ).
Youcanalsochooseafieldwheretheuseofquantitativestatisticsandcausality
principles[7]isinevitable,saymolecularbiology[8],orafunsubfieldsuchas
cancerresearch[9],orevennarrowerdomain,e.g.geneticanalysisoftumor
angiogenesis[10]andtryansweringimportantquestionsinthatparticularfield,
learningwhatyouneedintheprocess.

4)Learnaboutoptimization
ThissubjectisessentiallyprerequisitetounderstandingmanyMachineLearning
andSignalProcessingalgorithms,besidesbeingimportantinitsownright.
StartwithStephenP.Boyd 'svideolecturesandalsoWhataresomegood
resourcestolearnaboutoptimization?

5)Learnaboutmachinelearning
Beforeyougettothinkaboutalgorithmslookcarefullyatthedataandselect
featuresthathelpyoufiltersignalfromnoise.SeethistalkbyJeremyHoward:At
Kaggle,ItsaDisadvantageToKnowTooMuch
AlsoseeHowdoIlearnmachinelearning?andWhataresomeintroductory
resourcesforlearningaboutlargescalemachinelearning?Why?
Statisticsvs.machinelearning,fight!:http://brenocon.com/blog/2008/12...
Youcanstructureyourstudyprogramaccordingtoonlinecoursecatalogs
andcurriculaofMIT,Stanfordorothertopschools.Experimentwith
dataalot,hacksomecode,askquestions,talktogoodpeople,setupaweb
crawlerinyourgarage:TheAnatomyofaSearchEngine
Youcanjoinoneofthesestartupsandlearnbydoing:NaturalLanguage
Processing:Whatstartupsarehiringengineerswithstrengthsinmachine
learning/NLP?
Thealternative(andratherexpensive)optionistoenrollinaCS
program/MachineLearningtrackifyoupreferstudyinginaformal
setting.See:WhatmakesaMaster'sinComputerScience(MSCS)degreeworthit
andwhy?
Trytoavoidoverspecialization.Thebreadthfirstapproachoftenworksbestwhen
learninganewfieldanddealingwithhardproblems,seetheSecondvoyageof
HMSBeagle ontheadventuresofaningeniousyoungdataminer.

6)Learnaboutinformationretrieval

https://www.quora.com/HowcanIbecomeadatascientist1

11/16

12/24/2015

(9)HowcanIbecomeadatascientist?Quora

MachinelearningIsnotascoolasitsounds:http://teddziuba.com/2008
/05/mac...
WhataresomegoodresourcestogetstartedwithInformationRetrieval?Why?

7)Learnaboutsignaldetectionandestimation
Thisisaclassictopicand"datascience"parexcellenceinmyopinion.
SomeofthesemethodswereusedtoguidetheApollomissionordetect
enemysubmarinesandarestillinactiveuseinmanyfields.Thisis
oftenpartoftheEEcurriculum.
GoodreferencesareRobertF.Stengel'lectureslidesonoptimalcontroland
estimation:RobStengel'sHomePage ,AlanV.Oppenheim'sSignalsand
Systems .andWhataresomegoodresourcesforlearningaboutsignal
estimationanddetection?AgoodtopictofocusonfirstisKalmanfilter ,widely
usedforTimeseries forecasting.
Talkingaboutdata,youprobablywanttoknowsomethingaboutinformation:its
transmission,compressionandfilteringsignalfromnoise.Themethods
developedbycommunicationengineersinthe60s(suchasViterbidecoder ,now
usedinaboutabillioncellphones)areapplicabletoasurprisingvarietyofdata
analysistasks,fromStatisticalmachinetranslation tounderstandingthe
organizationandfunctionofmolecularnetworks .Agoodresourceforstartersis
InformationTheoryandReliableCommunication:RobertG.Gallager:
9780471290483:Amazon.com:Books .AlsoWhataresomegoodresourcesfor
learningaboutinformationtheory?

8)Masteralgorithmsanddatastructures
Whatarethemostlearnerfriendlyresourcesforlearningaboutalgorithms?

9)Practice
GettingInShapeForTheSportOfDataScience
Carpentry:http://softwarecarpentry.org/
DataScience:Whataresomegoodtoyproblemsindatascience?
Tools:Whataresomeofthebestdataanalysistools?
WherecanIfindlargedatasetsopentothepublic?

IfyoudodecidetogoforaMastersdegree:
10)StudyEngineering
I'dgoforCSwithafocusoneitherIRorMachineLearningoracombinationofbothand
takesomesystemscoursesalongtheway.Asa"datascientist"youwillhavetowriteaton
ofcodeandprobablydevelopdistributedalgorithms/systemstoprocessmassiveamounts
ofdata.MSinStatisticswillteachyouhowtodomodelingandregressionanalysisetc,not
howtobuildsystems,Ithinkthelatterismoreurgentlyneededthesedaysastheoldtools
becomeobsoletewiththeavalancheofdata.Thereisashortageofengineerswhocanbuild
adataminingsystemfromthegroundup.Youcanpickupstatisticsfrombooksand
experimentswithR(seeitem3above)ortakesomestatisticsclassesasapartofyourCS
studies.
Goodluck.
[1]http://mahout.apache.org/
[2]http://www.netlib.org/lapack/
[3]http://www.netlib.org/eispack/
[4]http://math.nist.gov/javanumeric...
[5]http://www.netlib.org/scalapack/
[6]http://labs.google.com/papers/ma...
[7]Amazon.com:Causality:Models,ReasoningandInference(9780521895606):Judea
Pearl:Books
[8]IntroductiontoBiology ,MIT7.012videolectures
[9]Hanahan&Weinberg,TheHallmarksofCancer,NextGeneration:PageonWisc
[10]Thechaoticorganizationoftumorassociatedvasculature,fromTheBiologyof

https://www.quora.com/HowcanIbecomeadatascientist1

12/16

12/24/2015

(9)HowcanIbecomeadatascientist?Quora

Cancer:RobertA.Weinberg:9780815342205:Amazon.com:Books ,p.562

UpdatedNov17,2013ViewUpvotes
Upvote 2.3k

Downvote Comments 21+

Share 32

KatieKent,DirectorofEducationalOutcomes@GalvanizeEmployee#1@Zipfian
Academy
184kViewsUpvotedbyJasonZhang,DataScientistatQuora
KatieisaMostViewedWriterinJobsandCareersinDataScience.

BecomeaDataScientistbyDoingDataScience
Thebestwaytobecomeadatascientististolearnanddodatascience.Therearea
manyexcellentcoursesandtoolsavailableonlinethatcanhelpyougetthere.
HereisanincrediblelistofresourcescompiledbyJonathanDinu,CofounderofZipfian
Academy ,whichtrainsdatascientistsanddataengineersinSanFranciscoviaimmersive
programs,fellowships,andworkshops.
EDIT:I'vehadseveralrequestsforapermalinktothisanswer.Seehere:APracticalIntro
toDataSciencefromZipfianAcademy
EDIT2:Seealso:"HowtoBecomeaDataScientist"onSlideShare:
http://www.slideshare.net/ryanor...
Environment
Pythonisagreatprogramminglanguageofchoiceforaspiringdatascientistsduetoits
generalpurposeapplicability,agentle (orfirm )learningcurve,andperhapsthe
mostcompellingreasontherichecosystemofresources andlibraries activelyusedby
thescientificcommunity.
Development
Whenlearninganewlanguageinanewdomain,ithelpsimmenselytohaveaninteractive
environmenttoexploreandtoreceiveimmediatefeedback.IPythonprovidesaninteractive
REPLwhichalsoallowsyoutointegrateawidevarietyofframeworks(includingR )into
yourPythonprograms.
STATISTICS
Datascientistsarebetteratsoftwareengineeringthanstatisticiansandbetteratstatistics
thananysoftwareengineer.Assuch,statisticalinferenceunderpinsmuchofthetheory
behinddataanalysisandasolidfoundationofstatisticalmethodsandprobabilityservesas
asteppingstoneintotheworldofdatascience.
Courses
edX:IntroductiontoStatistics:DescriptiveStatistics :Abasicintroductorystatistics
course.
CourseraStatistics,MakingSenseofData :AappliedStatisticscoursethatteachesthe
completepipelineofstatisticalanalysis
MIT:StatisticalThinkingandDataAnalysis :Introductiontoprobability,sampling,
regression,commondistributions,andinference.

https://www.quora.com/HowcanIbecomeadatascientist1

13/16

12/24/2015

(9)HowcanIbecomeadatascientist?Quora

WhileRisthedefactostandardforperformingstatisticalanalysis,ithasquiteahigh
learningcurveandthereareotherareasofdatascienceforwhichitisnotwellsuited.To
avoidlearninganewlanguageforaspecificproblemdomain,werecommendtryingto
performtheexercisesofthesecourseswithPythonanditsnumerousstatisticallibraries.
YouwillfindthatmuchofthefunctionalityofRcanbereplicatedwithNumPy ,
@SciPy ,@Matplotlib ,and@PythonDataAnalysisLibrary
Books
Wellwrittenbookscanbeagreatreference(andsupplement)tothesecourses,andalso
provideamoreindependentlearningexperience.Thesemaybeusefulifyoualreadyhave
someknowledgeofthesubjectorjustneedtofillinsomegapsinyourunderstanding:
O'ReillyThinkStats :AnIntroductiontoProbabilityandStatisticsforPython
programmers
IntroductiontoProbability :TextbookforBerkeleysStats134class,anintroductory
treatmentofprobabilitywithcomplementaryexercises.
BerkeleyLectureNotes,IntroductiontoProbability :Compiledlecturenotesofabove
textbook,completewithexercises.
OpenIntro :Statistics:Introductorytextbookwithsupplementaryexercisesandlabsin
anonlineportal.
ThinkBayes :AnsimpleintroductiontoBayesianStatisticswithPythoncodeexamples.
MACHINELEARNING/ALGORITHMS
AsolidbaseofComputerScienceandalgorithmsisessentialforanaspiringdatascientist.
Luckilythereareawealthofgreatresourcesonline,andmachinelearningisoneofthe
morelucrative(andadvanced)skillsofadatascientist.
Courses
CourseraMachineLearning :Stanfordsfamousmachinelearningcoursetaughtby
AndrewNg.
Coursera:ComputationalMethodsforDataAnalysis :Statisticalmethodsanddata
analysisappliedtophysical,engineering,andbiologicalsciences.
MITDataMining :Anintroductiontothetechniquesofdataminingandhowtoapply
ML...(more)
Upvote 1.2k

Downvote Comments 14+

Share 20

PeterSkomoroch,Sr.DataScientist@LinkedIn
94.7kViewsUpvotedbyJasonZhang,DataScientistatQuoraMarcBodnick1otheryou
follow
OriginallyAnswered:HowdoIbecomeadatascientist?

Ifyouhavethetimetotakecourses,giveitashot.
1)Trytotakesomeoftheundergradmathcoursesyoumissed.LinearAlgebra,Advanced
Calculus,Diff.Eq.,Probability,Statisticsarethemostimportant.Afterthat,takesome
MachineLearningcourses.ReadafewoftheleadingMLtextbooksandkeepupwith
journalstogetagoodsenseofthefield.
2)Readuponwhatthetopdatacompaniesaredoing.After1or2machinelearning
coursesyoushouldhaveenoughbackgroundtofollowmostoftheacademicpapers.
Implementsomeofthesealgorithmsonrealdata.
3)Ifyouareworkingwithlargedatasets,getfamiliarwiththelatesttechniques&tools
(Hadoop,NoSQL,Spark,etc.)byputtingthemintopracticeatwork(oroutsideofwork).
4)Abigpartofdatascienceontheproductdevelopmentsideisessentiallysoftware
engineering,andbeingabletocreate,modifyandimplementalgorithms.AsWilliamChen
mentioned,manydatascientistsknowPython,R,scikitslearnetc.,butthatismostlyfor
analysisorprototyping.Ifyouneedtoimplementanythingatscaleorwithinproduction

https://www.quora.com/HowcanIbecomeadatascientist1

14/16

12/24/2015

(9)HowcanIbecomeadatascientist?Quora

systemsyouwilllikelyneedtoknowhowtowritecodeinsomethinglikeJavaorC++.
CheckoutthebooktheAmazon.com:ThePragmaticProgrammer:FromJourneymanto
Master(9780201616224):AndrewHunt,DavidThomas:Books andtheSoftware
Carpentry courseifyouarecomingtosoftwaredevelopmentfromasciencebackground.
IdidaTCTVinterviewrecentlywithSemilShahwherewewentintomoredepthonhowto
becomeadatascientist:
*http://techcrunch.com/2012/09/06...
UpdatedApr10,2014ViewUpvotes
Upvote 504

Downvote Comments 7+

Share 4

PathanKarimkhan,Datascienceexcitesme!
28.7kViewsUpvotedbyRobertChang,DataJanitor@Twitter|TaiwaneseAmerican|
Statisticallyeducated|Aspiringsinger
Pathanhas70+answersinBigData.

Beingdatascientistrequiresasolidfoundationtypicallyincomputerscienceand
applications,modeling,statistics,analyticsandmath.
Whatsetsthedatascientistapartisstrongbusinessacumen,coupledwiththeabilityto
communicatefindingstobothbusinessandITleadersinawaythatcaninfluencehowan
organizationapproachesabusinesschallenge.Gooddatascientistswillnotjustaddress
businessproblems,theywillpicktherightproblemsthathavethemostvaluetothe
organization.
AlsoIbelieveindepthknowledgeinDatascience,MachinelearningandNLPwillhelpto
solvegroundtotoplevelissues.45yearsofdevelopmentexperiencecangivesuch
acumenship.

IntroductiontoCSCourse
Notes:IntroductiontoComputerScienceCoursethatprovidesinstructionson
coding.
OnlineResources:
UdacityintrotoCScourse ,
CourseraComputerScience101

Codeinatleastoneobjectorientedprogramminglanguage:C++,
Java,orPython
BeginnerOnlineResources:
CourseraLearntoProgram:TheFundamentals ,
MITIntrotoProgramminginJava ,
Google'sPythonClass ,
CourseraIntroductiontoPython ,
PythonOpenSourceEBook
IntermediateOnlineResources:
Udacity'sDesignofComputerPrograms ,
CourseraLearntoProgram:CraftingQualityCode ,
CourseraProgrammingLanguages ,
BrownUniversityIntroductiontoProgrammingLanguages
LearnotherProgrammingLanguages
Notes:AddtoyourrepertoireJavaScript,CSS,HTML,Ruby,PHP,C,Perl,
Shell.Lisp,Scheme.
OnlineResources:w3school.comHTMLTutorial ,Learntocode
TestYourCode
Notes:Learnhowtocatchbugs,createtests,andbreakyoursoftware
OnlineResources:UdacitySoftwareTestingMethods ,UdacitySoftware
Debugging
Developlogicalreasoningandknowledgeofdiscretemath
OnlineResources:
MITMathematicsforComputerScience ,

https://www.quora.com/HowcanIbecomeadatascientist1

15/16

12/24/2015

(9)HowcanIbecomeadatascientist?Quora

CourseraIntroductiontoLogic,
CourseraLinearandDiscreteOptimization ,
CourseraProbabilisticGraphicalModels,
CourseraGameTheory .
DevelopstrongunderstandingofAlgorithmsandDataStructures
Notes:Learnaboutfundamentaldatatypes(stack,queues,andbags),sorting
algorithms(quicksort,mergesort,heapsort),anddatastructures(binarysearch
trees,redblacktrees,hashtables),BigO.
OnlineResources:
MITIntroductiontoAlgorithms ,
CourseraIntroductiontoAlgorithmsPart1 &Part2 ,
WikipediaListofAlgorithms ,
WikipediaListofDataStructures ,
Book:TheAlgorithmDesignManual
Developastrongknowledgeofoperatingsystems
OnlineResources:UCBerkeleyComputerScience162 ...
(more)
Upvote 285

Downvote Comments 9+

Share 21

Wanttohelpotherslearnmore?

AskaQuestion

TopStoriesfromYourFeed
SwatiTiwari
thisDec18

XuBeixiand5moreupvoted

Howdoweovercometheregret
feelingofwastedyears?

DashdikpalNandeshwarand
DeepakupvotedthisDec18

KaoreOmkar

HowdoItellmybestfriendI'min
lovewithher?

AarushiRuddra,Doctorinprocess

ShreyasiBiswas,Student

135.2kViewsUpvotedbyRupalVerma
ShubhaHazra5othersyoufollow

50.3kViewsUpvotedbyVinitaPunjabi,
C.A.AspirantKaoreOmkarDeepak
DashdikpalNandeshwar

MyMomgavemetwopacketsofbiscuits
priortothejourney.Iateonebiscuitfrom
onepackandletitremainopenfortherest
ofthejourney.Towardstheend,Ihadone
wastedstalepackandone...

I'dliketotellyouabouttwostories,inbrief.
Mybestfriendfellinlovewithme6months
ago.HoweverIdidnotfeelthesame,butwe
continuedbeingbestfriendsuntilthingsgot
messedupan...

ReadInFeed

https://www.quora.com/HowcanIbecomeadatascientist1

ReadInFeed

SandhyaRamesh
BalaSenthilKumarand
1moreupvotedthis3am

Whoistheoldestknownpersonin
thehistoryofmankindwithavalid
proofoftheirage?
CarlosMatiasLaBorde,Software
developer,artist,occassional
entrepreneur
88.9kViewsUpvotedbySandhya
RameshGwenSawchuk1otheryou
follow

ReadInFeed

16/16

Potrebbero piacerti anche