Introduction
URL stands for uniform resource locator. URLs are used on the internet to define files and the protocols with which they should be processed. Here's an example of a URL:
http://www.cs.cornell.edu/Courses/cs211/2001fa/index.htmlIt consists of
<protocol>://<authority><path>?<query>#<fragment>We won't discuss or use the authority or the query.
Protocols
Here are the protocols one usually sees:
The appearance of "//" in the URL signals the beginning of a domain name, which is a name that has been registered as being assigned to or associated with a particular computer. Here are examples of domain names:
Ports
A URL can optionally specify a "port", which is the port number to which the TCP connection is made on the remote host machine. If the port is not specified, the default port for the protocol is used instead. For example, the default port for http is 80. An alternative port could be specified as:
http://www.ncsa.uiuc.edu:8080/demoweb/url-primer.htmlWe won't deal with ports, since they are usually not given in the URLs that we deal with.
Absolute paths
The path part of a URL is a path on the computer to the file that the URL describes. For example,
/~gries/Logic/Introduction.htmlindicates that the hard drive of the computer whose domain name was given has a folder (directory) ~gries; in that folder is a folder named Logic, and in that folder is a file named Introduction.html.
The character / is used to separate entities on the path, regardless of the operating system that is running on the computer --Unix-like, Windows, or Macintosh.
If the file name is missing at the end (so that the last entity is a folder), then a default file is chosen, usually index.html or index.htm. This default depends on (and can be changed) the computer on which the file resides. On some computers, we have seen the following defaults: default.html, default.htm, home.html, and home.htm.
Any path of any file or folder on your hard drive can be used in a URL. The form of the beginning of such a path depends on whether a Unix-like, Windows, or Macintosh operating system is being used. You can check this out on your own computer by loading any html file that is on your hard drive into a browser like Netscape Communicator or Internet Explorer and looking at the URL that is displayed. Here are examples for the three kinds of systems:
Within an html file, one can have a relative URL, as in href="people/faculty/faculty.htm".
The protocol and host are assumed to be the same as that of the current
html file, and the path is assumed to be relative to the folder in which
the html file appears. One can use .. in the path to move up in the path
of folders, as in all operating systems. For example, if the current folder
is /~gries/Logic, then relative path ../NoLogic/test.html refers to the
file /~gries/NoLogic/test.html.
java.net
Fragments
The following URL has the "fragment" #chap1:
http://java.sun.com/index.html#chapter1
The fragment is often the name of a "target" within the file given by the URL, but there are other uses for it. Technically, the fragment is not part of the URL. For assignments in CS211 this Fall, targets should be discarded if present.
Class URL
Package java.net contains class URL, which is used for dealing with
URLs in Java. Below, we provide terse specs for a few of the most
useful methods in class URL. We don't discuss the authority, query,
and fragment parts of URLs. We also make
the following comments.
First, if you have a variable of class URL, you can get a BufferedReader
to it using the following procedure --this procedure
returns null if the file URL is not for an http file or a file file
(a file on a local computer). Notice how an OpenStream is created for it
(returns one byte at a time), then an InputStreamReader (which
turns bytes into characters) from that, and finally the BufferedReader
(which had a method to read one line at a time). This is quite similar
to dealing with input files on your computer.
/** = a reader for URL url (which must
not be null). If the protocol
is not
http or file, null is returned. */
private BufferedReader getReader() {
if (!protocol.equals("http")
&& !protocol.equals("file")) {
return
null;
}
try {
InputStream is= url.openStream();
InputStreamReader isr= new InputStreamReader(is);
returnnew
BufferedReader(isr);
} catch (IOException
e) {
returnnull;
}
}
Next, when you have an absolute Url as a String s, use "new
URL(s)" to create an instance of URL for it. But if you
already have a URL c that is a directory, say, and you want to create
a URL for a file s that is within the directory, use
"new URL(c,s)".
Finally, give a URL u, use the various getter methods shown below to get components of it, usually as Strings.
public class URL {
// Constructor: a URL object constructed its
representation s. A call URL("...")
// is equivalent to a call URL(null, "...").
public URL(String s) throws MalformedURLException
/* Constructor: a URL object constructed from
URL c and String s.
If c's path component is empty
and the scheme, authority, and query components are
undefined, then the new URL is
a reference to the current document. Otherwise, any
fragment and query parts present
in the s are used in the new URL.
If s contains a protocol component
is and it does not match the protocol in c, then
the new URL is created as an absolute
URL based on the s alone. Otherwise, the
protocol component is inherited
from the context URL.
If the s's path component
begins with a slash character "/" then the path is treated as
absolute, and s's path replaces
the c's path. Otherwise s's path is treated as a relative
path and is appended to
c's path. The path is canonicalized through the removal of
directory changes made by
occurences of ".." and ".".
*/
public URL(URL c, String s) throws
MalformedURLException
// = the file name for this URL
public String getFile()
// = the host, or domain name, for this URL
public String getHost()
// = the path for this URL
public String getPath()
// = the port for this URL (-1 if the port is not
set)
public int getPort()
// = the protocol for this URL
public String getProtocol()
// = a newly opened connection to this URL, as an
InputStream. Shorthand for
//
openConnection().getInputStream()
public final InputStream openStream()
throws
IOException
// = a representation of this URL
public String toString()
}