Overview | Schedule | Announcements | Resources | Assignments | Home |
After many years of working with computers, Professor X, a Computer Scientist at a small midwestern college, recently went off of the deep end, deciding to give it all up and go on the road playing banjo as a member of a bluegrass band. Expecting to gain little income from his new venture, he has an unusual and innovative idea for a business on the Web, a new dot-com, that will essentially run itself, allowing him ample time for his new pursuit.
He is keeping his idea a secret, but figures that if he is able to easily monitor web activity he will be able to make minor changes to the web site to keep the business going. There are several places that web activity data are stored, one of them being the Web Log. He has a list of reports that he would like to generate and, because of your experience analyzing web logs, thinks you are the person to write the software application that will allow him to do so.
Since it has been a while since you actually worked with a web log, here is a description of the log entries you will be examining:
207.46.98.33 - - [01/Nov/2004:04:28:32 -0500] "GET /~runner/csc121/resources.htm HTTP/1.0" 200 3973
207.46.98.33
is the IP address of the client
(e.g., Internet Explorer) which made this request of the web
server- -
are always dashes in this log, but otherwise
could contain information about the client, such as userid[01/Nov/2004:04:28:32 -0500]
is the time the
server finished processing the request. The format is always the
same: [dd/mmm/yyyy:hh:mm:ss (+ or -)zzzz]
"GET /~runner/csc121/resources.htm HTTP/1.0"
is
the actual request from the client. GET
is the method
used, /~runner/csc121/resources.htm
is the requested
web page, and HTTP/1.0
is the protocol used by the
client200
is the status code sent to the client from the
server. Codes beginning in 2 indicate the page was successfully
delivered to the client; codes beginning in 4 indicate an error
caused by the client (a non-existant page was requested, for
example)3973
is the size of the page returned to the
clientNotice that each item above is separated from the other by a space. Note also that some of the items themselves contain spaces. If you were to read a line from the log file, one way to split the line into pieces would be to use the split method, as in line.split(" "). Then you would get an array of Strings with 10 elements, element 0 being "207.46.98.33", elements 1 and 2 both being "-", element 3 being "[01/Nov/2004:04:28:32" and so on.
Professor X would like the following information, for the time period spanned by the log, included in a report written to a text file:
The format of the report is up to you, but it should have a title and each of the requested statistics should be appropriately labeled and easily readable.
Copy the project
I:\CSC121\public\webanalyzer
to your own folder. The
project has several classes including one called Analyzer. This is
an empty class which you will need to write from scratch for this
assignment. Analyzer will have three main methods, input(Infile
inf), process(), and output(Outfile outf). Each of these are
described below. You are likely to decide to define "helper"
methods in addition to these three. A driver class will be calling
your methods, so they need to have the prescribed interface and
functionality.
The project folder also contains a web log,
access_logcps.txt
. This log has nine days worth of
data, November 1 - November 9, 2004.
public boolean input(Infile inf) - This method will use the read method of the Infile class (see that class interface) to add web log records to an ArrayList. The ArrayList will serve as a basis for analysis to be done by the process method. Each line of the input is a String and contains a full log record as described above. input() should return true or false according to whether an ArrayList of length greater than zero has been successfully constructed. After reading the last line from the input, you should close() the input file.
public void process() - This method will construct, populate and prepare the structures required for analysis. For example in order to compute the peak day and hour statistic, you will likely construct a 2-D array of frequency counts of visits, day x hour (9 x 24). You might want to use the method parseInt from the Integer class to convert a String to an int. You should extract day and hour from the time field of the web log entry. The syntax for using the function is:
int myInt = Integer.parseInt( <The String to convert> );
public void output(Outfile outf) - This method will use the write method of the Outfile class (see that class interface) to create a text file containing the results of the analysis. You should use the structures created by the process method to produce your report.You might want to produce the report in a terminal window before writing it to a text file. You should close the output file after writing the last line of output.
I:\CSC121\Project5
and send an email
message to your instructor naming the members (1 or 2) of your
team.Overview | Schedule | Announcements | Resources | Assignments | Home |
DePauw
University , Computer Science
Department , Spring 2005
Maintained by Brian
Howard ( bhoward@depauw.edu
).
Last updated