Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Home
LearnPython
Basics
Lists
Dictionary
CodeSnippets
Modules
Home>>WebScrapingwithBeautifulSoup
Mar.09,2016
Web&Internet
WebScrapingwithBeautifulSoup
WebScraping
"Webscraping(webharvestingorwebdataextraction)isacomputersoftware
techniqueofextractinginformationfromwebsites."
HTMLparsingiseasyinPython,especiallywithhelpoftheBeautifulSouplibrary.
Inthispostwewillscrapeawebsite(ourown)toextractallURL's.
GettingStarted
Tobeginwith,makesurethatyouhavethenecessarymodulesinstalled.
Intheexamplebelow,weareusingBeautifulSoup4andRequestsonasystemwith
Python2.7installed.
InstallingBeautifulSoupandRequestscanbedonewithpip:
$pipinstallrequests
$pipinstallbeautifulsoup4
WhatisBeautifulSoup?
http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 1/8
9/28/2016 WebScrapingwithBeautifulSoup
Onthetopoftheirwebsite,youcanread:"Youdidn'twritethatawfulpage.
You'rejusttryingtogetsomedataoutofit.BeautifulSoupisheretohelp.
Since2004,it'sbeensavingprogrammershoursordaysofworkonquickturnaround
screenscrapingprojects."
BeautifulSoupFeatures:
BeautifulSoupprovidesafewsimplemethodsandPythonicidiomsfornavigating,
searching,andmodifyingaparsetree:atoolkitfordissectingadocumentand
extractingwhatyouneed.Itdoesn'ttakemuchcodetowriteanapplication.
BeautifulSoupautomaticallyconvertsincomingdocumentstoUnicodeandoutgoing
documentstoUTF8.Youdon'thavetothinkaboutencodings,unlessthedocument
doesn'tspecifyanencodingandBeautifulSoupcan'tautodetectone.
Thenyoujusthavetospecifytheoriginalencoding.
BeautifulSoupsitsontopofpopularPythonparserslikelxmlandhtml5lib,
allowingyoutotryoutdifferentparsingstrategiesortradespeedfor
flexibility.
ExtractingURL'sfromanywebsite
NowwhenweknowwhatBS4isandwehaveinstalleditonourmachine,
let'sseewhatwecandowithit.
frombs4importBeautifulSoup
importrequests
url=raw_input("EnterawebsitetoextracttheURL'sfrom:")
r=requests.get("http://"+url)
data=r.text
soup=BeautifulSoup(data)
forlinkinsoup.find_all('a'):
print(link.get('href'))
Whenwerunthisprogram,itwillaskusforawebsitetoextracttheURL'sfrom
EnterawebsitetoextracttheURL'sfrom:www.pythonforbeginners.com
http://www.pythonforbeginners.com
http://www.pythonforbeginners.com/pythonoverviewstarthere/
http://www.pythonforbeginners.com/dictionary/
http://www.pythonforbeginners.com/pythonfunctionscheatsheet/
http://www.pythonforbeginners.com/lists/pythonlistscheatsheet/
http://www.pythonforbeginners.com/loops/
http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 2/8
9/28/2016 WebScrapingwithBeautifulSoup
http://www.pythonforbeginners.com/pythonmodules/
http://www.pythonforbeginners.com/strings/
http://www.pythonforbeginners.com/sitemap/
http://www.pythonforbeginners.com/feed/
http://www.pythonforbeginners.com
....
....
....
Irecommendthatyoureadourintroductionarticle:"BeautifulSoup4Python"
foundheretogetmoreknowledgeandunderstandingaboutBeautifulSoup.
MoreReading
http://www.crummy.com/software/BeautifulSoup/
http://docs.pythonrequests.org/en/latest/index.html
RecommendedPythonTrainingTreehouse
ForPythontraining,ourtoprecommendationisTreehouse.
Treehouseisanonlinetrainingservicethatteacheswebdesign,webdevelopmentandappdevelopment
withvideos,quizzesandinteractivecodingexercises.
TreehousehasbeginnertoadvancedPythontrainingthatprogrammersofalllevelsbenefitfrom.
Readmoreabout:
Web&Internet
http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 3/8
9/28/2016 WebScrapingwithBeautifulSoup
http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 4/8
9/28/2016 WebScrapingwithBeautifulSoup
http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 5/8
9/28/2016 WebScrapingwithBeautifulSoup
http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 6/8
9/28/2016 WebScrapingwithBeautifulSoup
DisclosureofMaterialConnection:Someofthelinksinthepostaboveareaffiliatelinks.Thismeansifyouclickonthelink
andpurchasetheitem,Iwillreceiveanaffiliatecommission.Regardless,PythonForBeginners.comonlyrecommendproductsor
servicesthatwetrypersonallyandbelievewilladdvaluetoourreaders.
Search SEARCH
follow@pythonbeginners
Categories
http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 7/8
9/28/2016 WebScrapingwithBeautifulSoup
Basics
Cheatsheet
Codesnippets
Development
Dictionary
ErrorHandling
Lists
Loops
Modules
Strings
System&OS
Web&Internet
http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 8/8