Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Beforewebegin!
Youmightwant.
Whatsgoingonhere?
Todayweregoingto.
Examineourdataresources! Trysomescraping! Trysomepulling! MessaroundwithanAPI! Sayhellotovisualization!
Data?Ihardlyknewa!
Data:Anydiscreetunit anditsmetainformation Usefuldata:Morethan onerecordofdata...but thatsecondrecordcan beinyourhead! Everything isnumbers!
InternalUse
ExternalUse
Tellmemoreofthisdataofwhichyouspeak! Realtime
Staticdatasets
DataisPowerful!
Theactofmeasuring something solidifiesitsstate. Ahh,thepower!!!
Dataismisleading!
Choosingonesource overanother Onlyportrayingparts ofthestatistic Choosingabiased methodofportrayal
InformationOverload:dontbelieve thehype
Flavorsofdata
Indexeddata documents,weblogs,images, videos,shoppingarticles,jobs... Cartographicandgeographicdata Geolocationsoftware,Geovisualization NewsAggregators Feeds,podcasts:
DATATYPE!
Straighttext CSV/tabdelimited XML/RSS/ATOM
Woulditfitinhere?Then itsdata!
JSON
VIA
Textfile Datafeed Scrapinghtml API Somecombination
DESTINATION
Spreadsheet(byhand) Browser(direct,javascript, php,perl...) Database (viasql usingphp, perl,etc....) Application (Processing,java, python) AsecondAPI
Mom,wheredoesdatacomefrom?
HTMLforscraping:Anywhereyoucanseetextonline Weather.com Yahootrendingtopics Preformatteddatasets:Anywhereitsavailable Amazondatasets opendata.gov Realtime rss feeds: Anywheretheresadatafeed Anyblogfeed Anynewsfeed PersonalizedAwesometargeteddata:AnywherewithanAPI. NewYorktimesAPI TwitterAPI
Choosewisely!
DATATYPE xml/rss csv xml xml html csv html xml VIA Browser textfile api api scraping textfile scraping browser DESTINATION Excel php:database php:browser javascript:browser php browser Processing Processing Processing
(throughphp)
Example1and2
Datatype VIA Destination HTMLSCRAPINGBROWSER (Weatherinfo)(PHP) (Firefox,orwhatever) Stepone:Gettoknowyourdata:
http://www.weather.com/weather/today/New+York+NY+10010?lswe=10010
Steptwo:Setupthecode
Example1:Straightscrapin
<?php $url = 'http://www.weather.com/weather/today/New+Y Getthedata! ork+NY+10010?lswe=10010'; $output=file_get_contents($url); echo$output;
DoSomethingwithit!
?>
Example1
<?php $url = 'http://www.weather.com/weather/today/New+Y ork+NY+10010?lswe=10010'; $output=file_get_contents($url); echo$output; ?>
Example2:Scrapingwithapurpose
$currentTerm =NULL;//we'llusethistoholdthewords! $myUrl ="http://www.google.com/trends/hottrends/atom/hourly $searchForStart ="sa=X\">"; $searchForEnd ="</a>"; $rawPage =file_get_contents($myUrl);
Geteverythingready Getthedata!
echo"<B>Thesearethishour'strendingtopicsonGoogle!</b><BR><BR>"; while($startPos =(strpos($rawPage,$searchForStart))){//aslongasthere'smorestufftofind,findit! $endPos =strpos($rawPage,$searchForEnd);//Andthenfindwhereitends! $length=$endPos $startPos; //Howlongisthisstringwe'vefound,anyway? if($startPos &&$endPos){ //Didwefindsomething?Then $currentTerm =substr($rawPage,($startPos+strlen($searchForStart)),$length6); echo$currentTerm ."<BR>"; }//endif $rawPage =substr($rawPage,($endPos +4)); }//endwhile
DoSomethingwithit!
Example2
$currentTerm =NULL;//we'llusethistoholdthewords! $myUrl ="http://www.google.com/trends/hottrends/atom/hourly $searchForStart ="sa=X\">"; $searchForEnd ="</a>"; $rawPage =file_get_contents($myUrl); echo"<B>Thesearethishour'strendingtopicsonGoogle!</b><BR><BR>"; while($startPos =(strpos($rawPage,$searchForStart))){//aslongasthere'smorestufftofind,findit! $endPos =strpos($rawPage,$searchForEnd);//Andthenfindwhereitends! $length=$endPos $startPos; //Howlongisthisstringwe'vefound,anyway? if($startPos &&$endPos){ //Didwefindsomething?Then $currentTerm =substr($rawPage,($startPos+strlen($searchForStart)),$length6); echo$currentTerm ."<BR>"; }//endif $rawPage =substr($rawPage,($endPos +4)); }//endwhile
Example3
Datatype VIA Destination XML RSSFEEDBROWSER (Huffingtonpost)(PHP) (Firefox,orwhatever) Stepone:Gettoknowyourdata:
http://feeds.huffingtonpost.com/huffingtonpost/raw_feed
Steptwo:Setupthecode
Whatsthisxmlstuff?
<introductorytags> <entry> <title></title> <id></id> <published></published> <updated>20100619T15:50:45Z</updated> <summary>summary> <author> <name></name> <uri>http://www.huffingtonpost.com/annenaylor/</uri> </author> <content></content> </entry>
Getthedata!
Example4
Datatype VIA Destination XML dataFEEDAPIandBROWSER (USExchangerates)(PHP) (GoogleChartsAPI Firefox,orwhatever) Stepone:Gettoknowyourdata:
http://rss.timegenie.com/forex.xml
Steptwo:Setupthecode
APIs?Eh?
Data allthetypesofdatawediscussedbefore Functionality
Dataconverters:languagetranslators,speechprocessing,url shorteners) Communication:email,IM,notifications Visualdatarendering:Informationvisualization,diagrams,maps Securityrelated :electronicpaymentsystems,IDidentification...
Example4:Doingthetwostep
Getthedata! Getitinaformwecanuse RunitthroughasecondProcess
Dosomethingwithit(likedisplayingthatbaby!)
Bringingdataintoahigherlevel Applicationlikeprocessing!
Installthesimplml library:
http://www.learningprocessing.com/tutorials/simpleml/
Inspectyourdataforstructure Writesomecode!
Declareyourxmlintent! Maketherequest! Processtherequest! Dofunstuffwithit!
Gooutanddosomescraping!
ZoeFraadeBlanar Fraade@gmail.com
www.binaryspark.com