The Web uses a form of address known as a Universal Resource Identifier (URI) to identify data objects on servers. The URI for an object is independent of which protocol is used to access the data. An object's URI also provides no real clue as to what type of data is being identified. However, most URIs include a filename extension (similar to the DOS filename extension) that can be used by the client application as a clue to how the object should be presented to the user. For example, a Web site dedicated to gardening may have a file named roses.htm, which is most likely an HTML document, and perhaps a file named roses.gif, which is probably a picture of roses.
The more common form of a URI is known as a Universal Resource Locator (URL). This is typically what is specified when you see a list of Web pages. A URL is a URI that contains protocol information specifying how the data object should be retrieved from the server. The difference is subtle but important. Here are a few examples that should clear up any confusion you may have:
URI: //myserver.com/user1/default.htm
URL: http://myserver.com/user1/default.htm
URI: //ftp.myserver.com/demos/demo.zip
URL: ftp://ftp.myserver.com/demos/demo.zip
The first two examples illustrate the crucial difference between a URI and a URL. The URL informs the machine at address myserver.com to retrieve a file named default.htm for its /user1 directory and return it to the client using the HTTP protocol. The URI merely defines the location of the file; whereas, the URL specifies how it should be retrieved.
Similar to DOS, the URI can contain either absolute or relative addressing. For instance, if you are viewing (or creating) a Web-based document with a URI of //myserver.com/user1/default.htm, you can use the following addressing mechanisms within the document:
* //myserver.com/home/file2.htm would specify an absolute path.
* file3.htm would specify a relative path equivalent to //myserver.com/user1/file3.htm.
* /file4.htm would specify a relative path equivalent to //myserver.com/file4.htm.
Another similarity to DOS in specifying a URI is that the URI cannot contain any spaces and must encode certain reserved characters. If whitespace is required within a URI, you must encode the space as the string "%20". So, a directory named "My Documents" if used in a URI would appear as "My%20Documents". Similarly, there are several reserved characters that have special meanings within a URI. These are outlined in Table 2.2. These reserved characters, if actually meant to appear within a URI, must be encoded using the ISO Latin-1 character set.
The http portion is used to indicate that the resource is to be retrieved using the HTTP proto-col. The