Sei sulla pagina 1di 4

HTML Src List Assignment (50 pts)

The purposes of this assignment are to become familiar with the use of pointers to access data in an array.

Functional Requirements

The general idea of this program is


to use various commands to analyze the content of HTML files. All of the commands are related to HTML tags
that have a src attribute.

In more detail, write a program that:

1. Prints "URL:\n" to the standard output.

2. Gets the url of the HTML document to analyze from the standard input.

3. Reads the HTML document into a string in memory.


1. You can read the output from an external application as if it were a file. The key to this feature is
the popen function. Once you've popen'ed an application, you can read the application's output with
commands like fgets().
2. To read an HTML document from the web, you can popen the curl application. To see how curl
works, try typing typing curl -s www.sbcc.edu at the command line. Note that it outputs the source
code of SBCC's home page.
3. Now, to run curl and access its output in your program:

fp = popen("curl -s www.sbcc.edu", "r");


fgets(buffer, BUFSIZ, fp);
pclose(fp);

4. Note that the fgets() in this code fragment just reads the first line from the HTML file into buffer.
You will need to write a loop that fgets() a line and strcat()'s it to the end of a big string containing
all lines of the HTML source code.
4. While the quit command has not been received:
1. Prints "Ready\n" to the stdout.

2. Gets a command to execute from the standard input. The command will be one of the following
(followed by a '\n'):
1. c = count. Prints the number of valid src attributes found in the HTML document to the
standard output. "Valid" src attributes are described below under non-functional
requirements.
2. t = tags. Each time a valid src attribute is found, print the associated tag name plus a "\n" to
the standard output.
3. u = urls. Each time a valid src attribute is found, print the associated URL plus a "\n" to the
standard output.
4. f = frequencies. Extra Credit. Write a list of tag names and tag counts (as with other
commands only include tags with valid src attributes) to the standard output. Separate the
names and counts by a \t; one tag+count combo per line. The order of the tag names in the
file must match the order in which tags are first encountered in the HTML source file (top to
bottom).
5. q = quit.

3. Perform the requested command.

5. Prints "Complete\n" to stdout.

Non-Functional Requirements
1. Your project name must follow the pattern: {FLname}HtmlSrcList, where {FLname} is replaced by the
first letter of your first name plus your last name. E.g. if your name is Maria Marciano, your project name
must be MMarcianoHtmlSrcList. If your project name does not follow this pattern, it will not be graded.

The largest HTML source string your program needs to be able to handle is 256kB.

2. Some assumptions you can make:


1. Assume valid xhtml. I.e. don't worry about things like a src attribute having a starting quote but not
an ending quote.
2. All src urls have double-quotes around them.
3. Assume tags and attributes are lower case.
4. There aren't any nasty CDATA sections in the HTML file.

3. Valid src attributes


1. When searching for src attributes, search for src=\". However, only count src attributes that have at
least one white space character before the s of src.
2. In particular, do not count matches likes .src=\". Though they may be valid code (usually
Javascript), we are not counting them here.

4. An acceptable solution for finding the tag that a src attribute belongs to is:
1. Starting at the src attribute, back up until you find a <.
2. Parse the word following the <.

5. Your source code file must be named main.c.

6. main.c must be properly formatted (use Ctrl-Shift-f early and often).

7. The first executable line of your program must turn off output buffering. This is a requirement for
automated testing. The following line will do the trick:

setvbuf(stdout, NULL, _IONBF, 0);

8. Your program must be organized into a set of cohesive functions.

9. Your program must be properly commented.

Sample Input and Output ( simple.html )


URL:
http://wfs.sbcc.edu/staff/nfguebels/web/cs137/html_src_list/simple.html
Ready
c
6
Ready
t
script
img
script
img
img
iframe
Ready
u
sbcc_files/udm-custom.js
sbcc_files/top10_homepage.png
sbcc_files/udm-dom.js
sbcc_files/top2_homepage.png
sbcc_files/home_transfer.png
javascript:''
Ready
f
script 2
img 3
iframe 1
Ready
q
Complete
Test/Grade your program:
1. Download (right-click, Save Link As... ) build.xml into your project directory.

2. In a terminal window, use cd to navigate to your project directory. Type grade to grade your program.

Scoring
testCount: 30 pts
testTags: 5 pts
testUrls: 5 pts
testMultipleCommands: 5pts
testSourceCode: 5 pts
testFrequencies: 5 pts extra credit

Potrebbero piacerti anche