Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Nota: A fraude denota uma grave falta de tica e constitui um comportamento no admissvel num estudante do ensino superior e futuro profissional. Qualquer tentativa de fraude pode levar reprovao na disciplina tanto do facilitador como do prevaricador.
XML and XML Manipulation, Java Message Service and Message Oriented Middleware
Objectives
Learn
XML
technologies.
In
particular,
you
will
learn
XML,
XSD,
XSL,
XSLT
and
XPATH.
This
project
is
mostly
about
XML
processing.
Understand
the
technique
of
Screen
Scraping.
Screen
scraping
consists
in
parsing
the
information
shown
in
a
terminal
so
that
it
can
be
used
on
a
different
system.
It
is
the
technique
used
for
application
integration
where
the
only
access
point
to
an
application
is
through
its
user
interface
(e.g.,
a
venerable
VT100
text
terminal).
Since,
nowadays,
web
systems
are
ubiquitous,
screen
scraping
is
mostly
used
to
gather
and
process
information
from
web
sites
that
do
not
expose
APIs
to
the
general
public
(or
their
business
partners).
Remember
(or
learn)
how
to
use
HTML
parsers.
These
parsers
can
read
HTML
code
and
create
data
structures
representing
the
web
page,
such
as
DOM1
documents.
You
may
also
need
to
resort
to
regular
expressions
to
clean
data
available
in
the
DOM
document.
Regular
expressions
are
an
extremely
powerful
mechanism
for
cleaning,
gathering
and
processing
data
embedded
in
text
files.
Learn
how
to
create
simple
asynchronous
and
message-oriented
applications.
1
Document
Object
Model.
1/11
Final Delivery
This
assignment
contains
two
parts:
one
is
for
training
only,
and
does
not
count
for
the
evaluation.
You
should
only
deliver
the
other
part.
You
must
submit
your
project
in
a
zip
file
using
Inforestudante.
Do
not
forget
to
associate
your
work
colleague
during
the
submission
process.
The
submission
contents
are:
o Source
code
of
the
requested
applications
ready
to
compile
and
execute.
o A
small
report
in
pdf
format
(5
pages
max)
about
the
implementation
of
the
project.
After
submitting,
you
are
required
to
register
the
(extra-class)
effort
spent
solving
the
assignment.
This
step
is
mandatory.
Please
fill
the
effort
form
at:
https://docs.google.com/spreadsheet/viewform?formkey=dG9KTWpla0dnRW1aQ1JNdzRVTUJJMFE6MA
Resources
Jsoup
Jsoup
Java
HTML
Parser,
with
best
of
DOM,
CSS,
and
jquery:
http://jsoup.org
Manual
at:
http://jsoup.org/cookbook/
XML,
XSD,
XSL
and
XSLT
XML:
http://www.w3schools.com/xml
XSD:
http://www.w3schools.com/schema
XPATH:
http://www.w3schools.com/xpath
Chapter
2:
Understanding
XML,
in
J2EE
1.4
Tutorial
http://download.oracle.com/javaee/1.4/tutorial/doc/
JAXB
Tutorial
Java.net:
http://jaxb.java.net/tutorial/index.html
Trang
http://www.thaiopensource.com/relaxng/trang.html
Processing
XML/XSLT
in
Java
Chapter
7:
Extensible
Stylesheet
Language
Transformations,
in
J2EE
1.4
Tutorial
(Especially,
the
part
How
XPath
Works
and
Transforming
XML
Data
with
XSLT)
http://download.oracle.com/javaee/1.4/tutorial/doc/
David
Jacobs,
Rescuing
XSLT
from
Niche
Status
A
Gentle
Introduction
to
XSLT
through
HTML
Templates,
http://www.xfront.com/rescuing-xslt.html
G.
Ken
Holman,
What
is
XSLT?,
in
XML.COM
http://www.xml.com/lpt/a/2000/08/holman/index.html
(Especially,
the
part
Getting
started
with
XSLT
and
XPath)
2/11
Paul Grosso and Norman Walsh, XSL Concepts and Practical Use, in NWalsh.COM http://nwalsh.com/docs/tutorials/xsl/xsl/frames.html
Java Message Service http://docs.oracle.com/javaee/6/api/ Introducing the Java Message Service: http://www.digilife.be/quickreferences/pt/introducing%20the%20java %20message%20service.pdf Mark Richards, Richard Monson-Haefel, and David A. Chappell, Java Message Service, http://serek.eurotrip.pl/Android_books/Java%20PDF%20eBooks/2009 %20-%20Java%20Message%20Service%202e%20(O'Reilly).pdf JMS with JBoss AS 7: http://eai-course.blogspot.pt JBoss download at: http://jboss.org/jbossas Advice: Skim all the links above before starting to read anything in detail. The recommended IDE to use is Eclipse IDE for Java EE Developers, however you are free to use another one. Note: You have short examples of some of the technologies in the next section.
3/11
<units>gflops</units> <spoof>EDGITest|fusion:EDGITest|fusion</spoof> <direction>dgsg|boinc</direction> </metric_data> <metric_data> <metric_name>past_workunits</metric_name> <timestamp>1308046204058</timestamp> <value>0.0</value> <type>uint32</type> <units>wus</units> <spoof>EDGITest|fusion:EDGITest|fusion</spoof> <direction>dgsg|boinc</direction> </metric_data> <metric_data> <metric_name>waiting_workunits</metric_name> <timestamp>1308046204059</timestamp> <value>0.0</value> <type>uint32</type> <units>wus</units> <spoof>EDGITest|dsp:EDGITest|dsp</spoof> <direction>dgsg|boinc</direction> </metric_data> <metric_data> <metric_name>success_rate</metric_name> <timestamp>1308046204061</timestamp> <value>1.0</value> <type>float</type> <units>percentage</units> <spoof>EDGITest|dsp:EDGITest|dsp</spoof> <direction>dgsg|boinc</direction> </metric_data> <metric_data> <metric_name>past_workunits_24_hours</metric_name> <timestamp>1308046204064</timestamp> <value>0.0</value> <type>uint32</type> <units>wus</units> <spoof>EDGITest|fusion:EDGITest|fusion</spoof> <direction>dgsg|boinc</direction> </metric_data> <metric_data> <metric_name>cpus_available</metric_name> <timestamp>1308046204066</timestamp> <value>0.0</value> <type>uint32</type> <units>cpus</units> <spoof>EDGITest|dsp:EDGITest|dsp</spoof> <direction>dgsg|boinc</direction> </metric_data> <metric_data> <metric_name>success_rate</metric_name> <timestamp>1308046204067</timestamp> <value>1.0</value>
4/11
<type>float</type> <units>percentage</units> <spoof>EDGITest|fusion:EDGITest|fusion</spoof> <direction>dgsg|boinc</direction> </metric_data> <metric_data> <metric_name>gflops</metric_name> <timestamp>1308046204092</timestamp> <value>0.0</value> <type>float</type> <units>gflops</units> <spoof>EDGITest|dsp:EDGITest|dsp</spoof> <direction>dgsg|boinc</direction> </metric_data> </report>
2. Now use the XML Binding Compiler (xjc) command-line tool to generate Java classes that represent the XML Schema that you generated. After this, write a simple program that performs two functions: a) Unmarshalls the information contained in the example XML to Java objects (the generated classes will hold the information); b) Marshalls the same information, now contained in Java Objects, back to XML.
3. Write an XSL file capable of outputting the XML data into an HTML table. Use a web browser to apply and visualize the transformation (you could also use a Java library, such as Xalan, for this purpose).
4. [Extra training] Lets now try the Java-first approach. In this case you will be writing the Java classes yourself, and using the JAXB notation (e.g., annotations). Check the tutorial first and use JAXB to output the following XML: a)
<?xml version="1.0" encoding="UTF-8"?> <class> <student> <name>Alberto</name> <age>21</age> </student> <student> <name>Patricia</name> <age>22</age> </student> <student> <name>Luis</name> <age>21</age> </student> </class>
5/11
b)
<?xml version="1.0" encoding="UTF-8"?> <class> <student id="201134441110"> <name>Alberto</name> <age>21</age> </student> <student id="201134441116"> <name>Patricia</name> <age>22</age> </student> <student id="201134441210"> <name>Luis</name> <age>21</age> </student> </class>
c)
<?xml version="1.0" encoding="UTF-8"?> <!-- Generated automatically. Don't change it. --> <class xmlns="http://www.dei.uc.pt/EAI"> <student xmlns="" id="201134441110"> <name>Alberto</name> <age>21</age> </student> <student xmlns="" id="201134441116"> <name>Patricia</name> <age>21</age> </student> <student xmlns="" id="201134441210"> <name>Luis</name> <age>21</age> </student> </class>
d)
<?xml version="1.0" encoding="UTF-8"?> <!-- Generated automatically. Don't change it. --> <h:class xmlns:h="http://www.dei.uc.pt/EAI"> <student id="201134441110"> <name>Alberto</name> <age>21</age> </student> <student id="201134441116"> <name>Patricia</name> <age>21</age> </student> <student id="201134441210"> <name>Luis</name> <age>21</age> </student> </h:class>
6/11
e)
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="test.xsl"?> <!-- Generated automatically. Don't change it. --> <h:class xmlns:h="http://www.dei.uc.pt/EAI"> <h:student id="201134441110"> <name>Alberto</name> <age>21</age> </h:student> <h:student id="201134441116"> <name>Patricia</name> <age>21</age> </h:student> <h:student id="201134441210"> <name>Luis</name> <age>21</age> </h:student> </h:class>
5. [Extra trainning] Try to manually write the XML Schema Definition for the XML of exercise 4.e).
7/11
6. How do queues behave when there is no receiver? Do they keep the messages or do they drop the messages? Also, what happens if two receivers exist for the same queue? (Relate this question to the 2nd execution question, above). 7. Consider now that you only want to receive messages concerning the Enterprise Application Integration course. How can you avoid the remaining? Will this work, both for queues and for topics? Implement a working example. 8. Assume now that you need to send an XML message for a topic. Which kind of JMS messages should you use? (You do not need to implement this exercise) 9. Explain the parameters of the method createTopicSession(). What are the different types of acknowledgment available and what are their differences? 10. Explain the difference between persistent messages and durable subscriptions.
8/11
Web$ Crawler$
JMS$ Topic$
HTML$ Summary$ Creator$
The
Web
Crawler
is
a
stand-alone
command-line
application
that
reads
a
web
page
and
sends
an
XML
message
(carrying
some
contents
of
the
web
page)
to
a
JMS
Topic.
You
should
use
an
HTML
parser
(e.g.,
Jsoup),
to
get
the
data
from
the
web
page.
You
should
not
parse
the
web
page
directly
using
regular
expressions.
Nevertheless,
you
are
allowed
to
use
regular
expressions
to
extract
small
pieces
of
data
from
the
results
of
the
HTML
parser.
For
example,
you
might
find
a
string
that
looks
like
val:
3.11
and
use
regular
expressions
to
extract
the
3.11.
Once
you
get
the
DOM
document
of
the
web
page,
you
will
need
to
convert
it
to
XML.
You
can
do
this
as
follows:
Define
the
XML
schema
(this
may
involve
the
trang
tool,
to
create
XSD
from
XML).
You
must
include
an
XML
schema
file
(XSD)
as
part
of
your
final
submission
and
be
ready
to
explain
it;
From
the
XML
schema,
generate
the
Java
classes
using
the
XML
binding
compiler,
xjc);
Once
you
have
the
Java
classes
that
can
keep
the
data,
you
can
instantiate
and
use
them
in
the
normal
way
in
the
Web
Crawler
source
code.
2
For
example
http://www.imdb.com/movies-coming-soon/2013-12/
,
but
you
can
choose
your
own
site. In this latter case, you must validate it with your Professor before starting. 9/11
Each time the Crawler runs, it parses the web page, creates and populates the Java objects that keep the web sites data, outputs an XML document to a JMS message and publishes this message on a JMS Topic. If the topic is down for some reason, you may want to keep a log with the message that the Crawler was unable to publish, to retry later. You are responsible for defining the format of the XML messages (please read the assignment until the end before starting). However, in general, each message must contain a list of movies, each movie carrying more information. This information must include: movie title, director, , categories (Drama, Comedy, Thriller, etc.). If your website is missing some data you find interesting for the assignment, you can add it to the XML, provided you contact the Professor previously. Although you only need one site and HTTP access, design your Crawler so that: - Changing web site does not require too much effort; - Changing to another input data source (e.g., FTP, file access) is simple. Finally, keep some test HTML files in your disk, just in case the website changes.
Stats
Producer
The
purpose
of
this
application
is
to
keep
track
of
the
top
N
rated
movies
of
all
time
(based
on
the
Metascore
rating,
available
at
the
IMDB
web
page
higher
values
are
better).
In
a
real
scenario,
this
application
would
make
more
complex
analyses
and
produce
rich
statistics
(probably
using
information
from
different
sources).
For
the
sake
of
simplicity,
you
are
only
required
to
store
the
information
about
the
top
N
movies.
For
example,
you
can
store
the
top
3
movies
that
have
the
highest
Metascore
(considering
all
movie
information
received
by
this
application)
on
disk.
You
can
also
assume
that
the
movie
title
is
a
unique
ID,
if
needed.
You
are
free
to
choose
your
file
output
format,
but
prepare
your
application
so
that
changing
the
output
format
is
easy.
Finally,
the
Stats
Producer
should
also
keep
a
durable
subscription
on
the
Topic,
to
read
all
the
Crawlers
messages
even
if
the
Stats
Producer
fails.
10/11
Grading
Grading
is
performed
according
to:
The
quality
of
the
data
model
used
for
representing
data
(XML/XSD)The
quality
of
the
code
(modularity,
formatting,
comments,
code
conventions,
etc.);
Simplicity
of
the
solution,
including
the
screen
scraping
part;
Final
presentation
of
the
work.
The
project
should
be
made
in
groups
of
2
students.
On
your
final
report
you
should
mention
who
was
mostly
involved
in
what
part.
Write
it
down
explicitly.
Also,
we
do
expect
all
the
members
of
the
group
to
be
fully
aware
of
all
the
parts
of
the
code
that
is
submitted.
Work
together!
11/11