Sei sulla pagina 1di 8

9/28/2016 WebScrapingwithBeautifulSoup

Home
LearnPython
Basics
Lists
Dictionary
CodeSnippets
Modules

Home>>WebScrapingwithBeautifulSoup
Mar.09,2016

Web&Internet

WebScrapingwithBeautifulSoup

WebScraping
"Webscraping(webharvestingorwebdataextraction)isacomputersoftware
techniqueofextractinginformationfromwebsites."

HTMLparsingiseasyinPython,especiallywithhelpoftheBeautifulSouplibrary.
Inthispostwewillscrapeawebsite(ourown)toextractallURL's.

GettingStarted
Tobeginwith,makesurethatyouhavethenecessarymodulesinstalled.

Intheexamplebelow,weareusingBeautifulSoup4andRequestsonasystemwith
Python2.7installed.

InstallingBeautifulSoupandRequestscanbedonewithpip:

$pipinstallrequests

$pipinstallbeautifulsoup4

WhatisBeautifulSoup?

http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 1/8
9/28/2016 WebScrapingwithBeautifulSoup

Onthetopoftheirwebsite,youcanread:"Youdidn'twritethatawfulpage.
You'rejusttryingtogetsomedataoutofit.BeautifulSoupisheretohelp.
Since2004,it'sbeensavingprogrammershoursordaysofworkonquickturnaround
screenscrapingprojects."

BeautifulSoupFeatures:

BeautifulSoupprovidesafewsimplemethodsandPythonicidiomsfornavigating,
searching,andmodifyingaparsetree:atoolkitfordissectingadocumentand
extractingwhatyouneed.Itdoesn'ttakemuchcodetowriteanapplication.

BeautifulSoupautomaticallyconvertsincomingdocumentstoUnicodeandoutgoing
documentstoUTF8.Youdon'thavetothinkaboutencodings,unlessthedocument
doesn'tspecifyanencodingandBeautifulSoupcan'tautodetectone.

Thenyoujusthavetospecifytheoriginalencoding.

BeautifulSoupsitsontopofpopularPythonparserslikelxmlandhtml5lib,
allowingyoutotryoutdifferentparsingstrategiesortradespeedfor
flexibility.

ExtractingURL'sfromanywebsite
NowwhenweknowwhatBS4isandwehaveinstalleditonourmachine,
let'sseewhatwecandowithit.

frombs4importBeautifulSoup

importrequests

url=raw_input("EnterawebsitetoextracttheURL'sfrom:")

r=requests.get("http://"+url)

data=r.text

soup=BeautifulSoup(data)

forlinkinsoup.find_all('a'):
print(link.get('href'))

Whenwerunthisprogram,itwillaskusforawebsitetoextracttheURL'sfrom

EnterawebsitetoextracttheURL'sfrom:www.pythonforbeginners.com
http://www.pythonforbeginners.com
http://www.pythonforbeginners.com/pythonoverviewstarthere/
http://www.pythonforbeginners.com/dictionary/
http://www.pythonforbeginners.com/pythonfunctionscheatsheet/
http://www.pythonforbeginners.com/lists/pythonlistscheatsheet/
http://www.pythonforbeginners.com/loops/

http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 2/8
9/28/2016 WebScrapingwithBeautifulSoup

http://www.pythonforbeginners.com/pythonmodules/
http://www.pythonforbeginners.com/strings/
http://www.pythonforbeginners.com/sitemap/
http://www.pythonforbeginners.com/feed/
http://www.pythonforbeginners.com
....
....
....

Irecommendthatyoureadourintroductionarticle:"BeautifulSoup4Python"
foundheretogetmoreknowledgeandunderstandingaboutBeautifulSoup.

MoreReading

http://www.crummy.com/software/BeautifulSoup/
http://docs.pythonrequests.org/en/latest/index.html

RecommendedPythonTrainingTreehouse

ForPythontraining,ourtoprecommendationisTreehouse.

Treehouseisanonlinetrainingservicethatteacheswebdesign,webdevelopmentandappdevelopment
withvideos,quizzesandinteractivecodingexercises.

TreehousehasbeginnertoadvancedPythontrainingthatprogrammersofalllevelsbenefitfrom.

Readmoreabout:

Web&Internet

http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 3/8
9/28/2016 WebScrapingwithBeautifulSoup

http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 4/8
9/28/2016 WebScrapingwithBeautifulSoup

http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 5/8
9/28/2016 WebScrapingwithBeautifulSoup

http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 6/8
9/28/2016 WebScrapingwithBeautifulSoup

DisclosureofMaterialConnection:Someofthelinksinthepostaboveareaffiliatelinks.Thismeansifyouclickonthelink
andpurchasetheitem,Iwillreceiveanaffiliatecommission.Regardless,PythonForBeginners.comonlyrecommendproductsor
servicesthatwetrypersonallyandbelievewilladdvaluetoourreaders.

Search SEARCH

follow@pythonbeginners

Categories
http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 7/8
9/28/2016 WebScrapingwithBeautifulSoup

Basics
Cheatsheet
Codesnippets
Development
Dictionary
ErrorHandling
Lists
Loops
Modules
Strings
System&OS
Web&Internet

http://www.pythonforbeginners.com/pythonontheweb/webscrapingwithbeautifulsoup/ 8/8

Potrebbero piacerti anche