Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Agenda
TODAY'S
AGENDA
Agenda
R eview B as ic
S C R EE N S C R APE R
THEORY
THEOR
TODAY'S
AGENDA
Agenda
Define what
C ons titutes a
DIFFIC ULT C AS E
TODAY'S
AGENDA
Agenda
Demo s ome
S C R EE N S C R APE R
TR IC K S
TODAY'S
AGENDA
Agenda
TODAY'S
AGENDA
Agenda
S hare a
HEAR TWAR MING
MOMENT
TODAY'S
AGENDA
Agenda
S hare a
HEAR TWAR MING
MOMENT
Featuring
C A PTC HA s!
TODAY'S
AGENDA
Michael Schrenk
BIO:
Michael Schrenk
BIO:
Michael Schrenk
BIO:
Michael Schrenk
BIO:
BIO:
My DEFCON History
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
BIO:
My DEFCON History
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Talk:
Introduction to Writing Spiders & Agents
BIO:
My DEFCON History
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Talk:
Online Corporate Intelligence
BIO:
My DEFCON History
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Talk:
The
Fabulous
Executable
Image
Exploit
BIO:
My DEFCON History
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Today's Talk:
Screen Scraper Tricks
Difficult Cases
My book
Review of traditional
screen scraping
Review of traditional
screen scraping
Review of traditional
screen scraping
Manage cookies
Review of traditional
screen scraping
Manage cookies
Review of traditional
screen scraping
Manage cookies
Review of traditional
screen scraping
Manage cookies
Review of traditional
screen scraping
Manage cookies
Review of traditional
screen scraping
Manage cookies
Review of traditional
screen scraping
FREE DOWNLOAD
Manage cookies
DHTML
= submit
=
9S8DUF9S8DUFS98DFUS9
D8FUS9D8FHNSIDJFSIDFJNW98
3FHSJEFNSKUJFNWO83FJWOSEJ
KFNSKU3FHS9A38FHIWwe832>
Browser Macros
Browser plug-in
Browser Macros
Browser plug-in
Readily available
Browser Macros
Browser plug-in
Readily available
Browser Macros
Browser plug-in
Readily available
Browser Macros
iMacros solves all of the
Browser plug-in
difficult
cases
because an actual
Readilybrowser
availableis used.
Solves all the
issues hacks
mentioned
A few additional
make it
screen
a serious
scraper tool.
Easily hacked
beyond intended
use
INSTALL
iMacros
Search for
iMacros add-on at
addons.mozilla.org
RECORDING
A MACRO
Once iMacros is
installed
Start the add-on
And press Record
RECORDING
A MACRO
Enter URL
Fill form and
press Save
RECORDING
A MACRO
Press Stop
PLAYING
A MACRO
Find the
#Current.imm macro
And press Play
Your macro will
replay!
Switch to demo
This is a REALLY SIMPLE demo!
You need to trust me that it will also
work in a much more complex
environment (i.e. a difficult case)!
Create a
macro
Template
(text file)
Run PHP
program
to convert
template
into a macro
Run the
macro
Substituting Variables
#01
#02
#03
#04
#05
#06
#07
#08
Substituting Variables
#01
#02
#03
#04
#05
#06
#07
#08
Substituting Variables
#01
#02
#03
#04
#05
#06
#07
#08
#02
#03
#04
#05
#06
#07
#08
#02
#03
#04
#05
#06
#07
#08
#02
#03
#04
#05
#06
#07
#08
iMacros Hints
iMacros Hints
iMacros Hints
'##################################################
' Set maximum web page time out
SET !TIMEOUT 240
' Tell iMacros to ignore error messages
SET !ERRORIGNORE YES
' Clear ALL cookies
CLEAR
' Initialize Browser tab 1, close all other tabs
TAB T=1
TAB CLOSEALLOTHERS
' Tell iMacros to ignore images (nice if using Tor)
FILTER TYPE=IMAGES STATUS=ON
' Tell iMacros to ignore extract messages
SET !EXTRACT_TEST_POPUP NO
'##################################################
'##################################################
' Set maximum web page time out
SET !TIMEOUT 240
' Tell iMacros to ignore error messages
SET !ERRORIGNORE YES
' Clear ALL cookies
CLEAR
' Initialize Browser tab 1, close all other tabs
TAB T=1
TAB CLOSEALLOTHERS
' Tell iMacros to ignore images (nice if using Tor)
FILTER TYPE=IMAGES STATUS=ON
' Tell iMacros to ignore extract messages
SET !EXTRACT_TEST_POPUP NO
'##################################################
'##################################################
' Set maximum web page time out
SET !TIMEOUT 240
' Tell iMacros to ignore error messages
SET !ERRORIGNORE YES
' Clear ALL cookies
CLEAR
' Initialize Browser tab 1, close all other tabs
TAB T=1
TAB CLOSEALLOTHERS
' Tell iMacros to ignore images (nice if using Tor)
FILTER TYPE=IMAGES STATUS=ON
' Tell iMacros to ignore extract messages
SET !EXTRACT_TEST_POPUP NO
'##################################################
'##################################################
' Set maximum web page time out
SET !TIMEOUT 240
' Tell iMacros to ignore error messages
SET !ERRORIGNORE YES
' Clear ALL cookies
CLEAR
' Initialize Browser tab 1, close all other tabs
TAB T=1
TAB CLOSEALLOTHERS
' Tell iMacros to ignore images (nice if using Tor)
FILTER TYPE=IMAGES STATUS=ON
' Tell iMacros to ignore extract messages
SET !EXTRACT_TEST_POPUP NO
'##################################################
'##################################################
' Set maximum web page time out
SET !TIMEOUT 240
' Tell iMacros to ignore error messages
SET !ERRORIGNORE YES
' Clear ALL cookies
CLEAR
' Initialize Browser tab 1, close all other tabs
TAB T=1
TAB CLOSEALLOTHERS
' Tell iMacros to ignore images (nice if using Tor)
FILTER TYPE=IMAGES STATUS=ON
' Tell iMacros to ignore extract messages
SET !EXTRACT_TEST_POPUP NO
'##################################################
'##################################################
' Set maximum web page time out
SET !TIMEOUT 240
' Tell iMacros to ignore error messages
SET !ERRORIGNORE YES
' Clear ALL cookies
CLEAR
' Initialize Browser tab 1, close all other tabs
TAB T=1
TAB CLOSEALLOTHERS
' Tell iMacros to ignore images (nice if using Tor)
FILTER TYPE=IMAGES STATUS=ON
' Tell iMacros to ignore extract messages
SET !EXTRACT_TEST_POPUP NO
'##################################################
'##################################################
' Set maximum web page time out
SET !TIMEOUT 240
' Tell iMacros to ignore error messages
SET !ERRORIGNORE YES
' Clear ALL cookies
CLEAR
' Initialize Browser tab 1, close all other tabs
TAB T=1
TAB CLOSEALLOTHERS
' Tell iMacros to ignore images (nice if using Tor)
FILTER TYPE=IMAGES STATUS=ON
' Tell iMacros to ignore extract messages
SET !EXTRACT_TEST_POPUP NO
'##################################################
A complete iMacros
command reference
Is available at:
wiki.imacros.net/Command_Reference
Firefox/iMacros
equipped
Harvester
(XP, Ubuntu)
Central
Server
Target
Website(s)
Firefox/iMacros
equipped
Harvester
(XP, Ubuntu)
Periodically asks
for instructions
Central
Server
Target
Website(s)
Firefox/iMacros
equipped
Harvester
(XP, Ubuntu)
Target
Website(s)
Periodically asks
for instructions
Tells Harvester
what to do
Central
Server
Firefox/iMacros
equipped
Harvester
(XP, Ubuntu)
Target
Website(s)
iMacros Macro
1. Request data
2. Save Screens
3. Parse results
Periodically asks
for instructions
Tells Harvester
what to do
Central
Server
Firefox/iMacros
equipped
Harvester
(XP, Ubuntu)
Target
Website(s)
iMacros Macro
1. Request data
2. Save Screens
3. Parse results
Periodically asks
for instructions
Tells Harvester
what to do
Update central
server
Central
Server
Website
requests
Target
Website(s)
Harvester
Harvester
Harvester
Harvester
Harvester
Harvester
Harvester
Harvester
Harvester
Raw
websites
Harvester
Harvester
Harvester
Harvester
Instructions
or software
updates
Central
Server
Harvester
Harvester
Harvester
Data and/or
scraping
status
Switch to demo #2
You need to trust me that it will also work in a
more complex environment (i.e. a difficult case)!
Heartwarming moment
ReCAPTCHA
ReCAPTCHA
ReCAPTCHA
ReCAPTCHA
ReCAPTCHA
ReCAPTCHA
Unlike OCR
these are solved
by REAL people
Unlike OCR
these are solved
by REAL people
Do a quick
Google search
for details
Heartwarming moment
There are CAPTCHA solving services
CAPTCHA
DISPLAYED
ON
WEB PAGE
Heartwarming moment
There are CAPTCHA solving services
CAPTCHA
DISPLAYED
ON
WEB PAGE
CAPTCHA
IMAGE SENT
TO SERVICE
Heartwarming moment
There are CAPTCHA solving services
CAPTCHA
DISPLAYED
ON
WEB PAGE
CAPTCHA
IMAGE SENT
TO SERVICE
CAPTCHA
SOLVED
BY HUMAN
Heartwarming moment
There are CAPTCHA solving services
CAPTCHA
DISPLAYED
ON
WEB PAGE
EMBEDDED
TEXT SENT
BACK TO
REQUESTOR
CAPTCHA
IMAGE SENT
TO SERVICE
CAPTCHA
SOLVED
BY HUMAN
Heartwarming moment
There are CAPTCHA solving services
CAPTCHA
DISPLAYED
ON
WEB PAGE
CAPTCHA
IMAGE SENT
TO SERVICE
EMBEDDED
TEXT SENT
BACK TO
REQUESTOR
TEXT IS
ENTERED
IN CAPTCHA
TEXTBOX
CAPTCHA
SOLVED
BY HUMAN
Heartwarming moment
There are CAPTCHA solving services
CAPTCHA
DISPLAYED
ON
WEB PAGE
CAPTCHA
IMAGE SENT
TO SERVICE
CAPTCHA
SOLVED
BY HUMAN
EMBEDDED
TEXT SENT
BACK TO
REQUESTOR
TEXT IS
ENTERED
IN CAPTCHA
TEXTBOX
CAPTCHA SOLVED!
(Unintentional
Consequences)
Heartwarming moment
CAPTCHA
CAPTCHA
SPAMMERS
PAY
TO
DIGITIZE
CAPTCHA
DISPLAYED
IMAGE SENT
SOLVED
ON
TO SERVICE
BY HUMAN
WEB PAGE OLD DOCUMENTS
CAPTCHHA SOLVED!
PEOPLE IN DEVELOPING
(Unintentional
Consequences)
NATIONS HAVE JOBS
EMBEDDED
TEXT SENT
BACK TO
REQUESTOR
TEXT IS
ENTERED
IN CAPTCHA
TEXTBOX
In conclusion
Review of traditional scraper theory
Described web design technologies and techniques
that create difficult cases for webbot/screen
scraper developers
Saw that iMacros can solve most (all) difficult cases
by:
Absolute browser emulation
Complete control (through hacks)
Looked at managing large scale deployments
In conclusion
Review of traditional scraper theory
Described web design technologies and techniques
that create difficult cases for webbot/screen
scraper developers
Saw that iMacros can solve most (all) difficult cases
by:
Absolute browser emulation
Complete control (through hacks)
Looked at managing large scale deployments
In conclusion
Review of traditional scraper theory
Described web design technologies and techniques
that create difficult cases for webbot/screen
scraper developers
Saw that iMacros can solve most (all) difficult cases
by:
Absolute browser emulation
Complete control (through hacks)
Looked at managing large scale deployments
In conclusion
Review of traditional scraper theory
Described web design technologies and techniques
that create difficult cases for webbot/screen
scraper developers
Saw that iMacros can solve most (all) difficult cases
by:
Absolute browser emulation
Complete control (through hacks)
Looked at managing large scale deployments
In conclusion
Review of traditional scraper theory
Described web design technologies and techniques
that create difficult cases for webbot/screen
scraper developers
Saw that iMacros can solve most (all) difficult cases
by:
Absolute browser emulation
Complete control (through hacks)
Looked at managing large scale deployments
In conclusion
Review of traditional scraper theory
Described web design technologies and techniques
that create difficult cases for webbot/screen
scraper developers
Saw that iMacros can solve most (all) difficult cases
by:
Absolute browser emulation
Complete control (through hacks)
Looked at managing large scale deployments
Thank you!
Questions?
www.schrenk.com
mike@schrenk.com
twitter.com/mgschrenk