Monday, September 04, 2006

Addressing on the Web

As mentioned in the preceding section, HTTP is used to transfer data objects from a server machine to a client. In order to make this transfer happen, the two applications involved in the conversation must recognize a common addressing mechanism. This addressing mechanism must uniquely identify every data object available not only on the server application machine, but also on the entire network the applications are using for communication. The addressing scheme must also be familiar to programmers and publishers on the Web because addresses must be used both to gather and to publish information or services on the Web.

The Web uses a form of address known as a Universal Resource Identifier (URI) to identify data objects on servers. The URI for an object is independent of which protocol is used to access the data. An object's URI also provides no real clue as to what type of data is being identified. However, most URIs include a filename extension (similar to the DOS filename extension) that can be used by the client application as a clue to how the object should be presented to the user. For example, a Web site dedicated to gardening may have a file named roses.htm, which is most likely an HTML document, and perhaps a file named roses.gif, which is probably a picture of roses.

The more common form of a URI is known as a Universal Resource Locator (URL). This is typically what is specified when you see a list of Web pages. A URL is a URI that contains protocol information specifying how the data object should be retrieved from the server. The difference is subtle but important. Here are a few examples that should clear up any confusion you may have:

URI: //myserver.com/user1/default.htm

URL: http://myserver.com/user1/default.htm

URI: //ftp.myserver.com/demos/demo.zip

URL: ftp://ftp.myserver.com/demos/demo.zip

The first two examples illustrate the crucial difference between a URI and a URL. The URL informs the machine at address myserver.com to retrieve a file named default.htm for its /user1 directory and return it to the client using the HTTP protocol. The URI merely defines the location of the file; whereas, the URL specifies how it should be retrieved.

Similar to DOS, the URI can contain either absolute or relative addressing. For instance, if you are viewing (or creating) a Web-based document with a URI of //myserver.com/user1/default.htm, you can use the following addressing mechanisms within the document:


* //myserver.com/home/file2.htm would specify an absolute path.

* file3.htm would specify a relative path equivalent to //myserver.com/user1/file3.htm.

* /file4.htm would specify a relative path equivalent to //myserver.com/file4.htm.

Another similarity to DOS in specifying a URI is that the URI cannot contain any spaces and must encode certain reserved characters. If whitespace is required within a URI, you must encode the space as the string "%20". So, a directory named "My Documents" if used in a URI would appear as "My%20Documents". Similarly, there are several reserved characters that have special meanings within a URI. These are outlined in Table 2.2. These reserved characters, if actually meant to appear within a URI, must be encoded using the ISO Latin-1 character set.

The http portion is used to indicate that the resource is to be retrieved using the HTTP proto-col. The is the Internet hostname or IP address for the machine on which the resource resides. The port, which is a numeric value, is an optional parameter necessary if the server is not listening on TCP port 80 (which is the value assumed if is not specified). The portion is either an absolute path or relative path locating the resource within the server's file structure. If the is not specified, the server should respond with a default HTML file. This method is typically used to access the site's home page. You can specify the , however, if you know the exact URL for the resource you're interested in. The default file's location is typically set up in the server software's setup or configuration program. If the resource can be searched, the ? portion can be provided to instruct the server on how the resource should be searched. This item is both server and resource specific. Later chapters address these searchable resources in-depth.