Class HttpRequest
URLConnection
class.
Caches connections to hosts, and reuses them if possible. Talks HTTP/1.1 to the hosts, in order to keep alive connections as much as possible.
The sequence of events for using an HttpRequest
is similar
to how URLConnection
is used:
- A new
HttpRequest
object is constructed. - The setup parameters are modified:
- The host (or proxy) is contacted and the HTTP request is issued:
- The response headers and body are examined:
- The connection is closed:
In the common case, all the setup parameters are initialized to sensible
values and won't need to be modified. Most users will only need to
construct a new HttpRequest
object and then call
getInputStream
to read the contents. The rest of the
member variables and methods are only needed for advanced behavior.
The HttpRequest
class is intended to be a replacement for the
URLConnection
class. It operates at a lower level and makes
fewer decisions on behavior. Some differences between the
HttpRequest
class and the URLConnection
class
follow:
- there are no undocumented global variables (specified in
System.getProperties
) that modify the behavior ofHttpRequest
. -
HttpRequest
does not automatically follow redirects. -
HttpRequest
does not turn HTTP responses with a status code other than "200 OK" intoIOExceptions
. Sometimes it may be necessary and even quite useful to examine the results of an "unsuccessful" HTTP request. -
HttpRequest
issues HTTP/1.1 requests and handles HTTP/0.9, HTTP/1.0, and HTTP/1.1 responses. - the
URLConnection
class leaks open sockets if there is an error reading the response or if the target does not use Keep-Alive, and depends upon the garabge collector to close and release the open socket in these cases, which is unreliable because it may lead to intermittently running out of sockets if the garbage collector doesn't run often enough. - If the user doesn't read all the data from an
URLConnection
, there are bugs in its implementation (as of JDK1.2) that may cause the program to block forever and/or read an insufficient amount of data before trying to reuse the underlying socket.
A number of the fields in the HttpRequest
object are public,
by design. Most of the methods mentioned above are convenience methods;
the underlying data fields are meant to be accessed for more complicated
operations, such as changing the socket factory or accessing the raw HTTP
response line. Note however, that the order of the methods described
above is important. For instance, the user cannot examine the response
headers (by calling getResponseHeader
or by examining the
variable responseHeaders
) without first having connected to
the host.
However, if the user wants to modify the default behavior, the
HttpRequest
uses the value of a number of variables and
automatically sets some HTTP headers when sending the request. The user
can change these settings up until the time connect
is
called, as follows:
- variable
version
- By default, the
HttpRequest
issues HTTP/1.1 requests. The user can setversion
to change this to HTTP/1.0. - variable
method
- If
method
isnull
(the default), theHttpRequest
decides what the HTTP request method should be as follows: If the user has calledgetOutputStream
, then the method will be "POST", otherwise the method will be "GET". - variable
proxyHost
- If the proxy host is specified, the HTTP request will be
sent via the specified proxy:
-
connect
opens a connection to the proxy. - uses the "Proxy-Connection" header to keep alive the connection.
- sends a fully qualified URL in the request line, for example "http://www.foo.com/index.html". The fully qualified URL tells the proxy to forward the request to the specified host.
-
connect
opens a connection to the remote host. - uses the "Connection" header to keep alive the connection.
- sends a host-relative URL in the request line, for example
"/index.html". The relative URL is derived from the fully
qualified URL used to construct this
HttpRequest
.
-
- header "Connection" or "Proxy-Connection"
- The
HttpRequest
sets the appropriate connection header to "Keep-Alive" to keep alive the connection to the host or proxy (respectively). By setting the appropriate connection header, the user can control whether theHttpRequest
tries to use Keep-Alives. - header "Host"
- The HTTP/1.1 protocol requires that the "Host" header be set
to the name of the machine being contacted. By default, this is
derived from the URL used to construct the
HttpRequest
, and is set automatically if the user does not set it. - header "Content-Length"
- If the user calls
getOutputStream
and writes some data to it, the "Content-Length" header will be set to the amount of data that has been written at the time thatconnect
is called.
Once all data has been read from the remote host, the underlying socket may be automatically recycled and used again for subsequent requests to the same remote host. If the user is not planning on reading all the data from the remote host, the user should call
close
to release
the socket. Although it happens under the covers, the user should be
aware that if an IOException occurs or once data has been read normally
from the remote host, close
is called automatically. This
is to ensure that the minimal number of sockets are left open at any time.
The input stream that getInputStream
provides automatically
hides whether the remote host is providing HTTP/1.1 "chunked" encoding or
regular streaming data. The user can simply read until reaching the
end of the input stream, which signifies that all the available data from
this request has been read. If reading from a "chunked" source, the
data is automatically de-chunked as it is presented to the user. Currently,
no access is provided to the underlying raw input stream.
- Version:
- 2.7
- Author:
- Colin Stevens (colin.stevens@sun.com)
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected boolean
static String
The default HTTP version string to send to the remote host when issuing requests.static String
The default proxy host for HTTP requests.static int
The default proxy port for HTTP requests.static boolean
setting this to "true" causing all http headers to be printed on the standard error stream; useful for debugging client/server interactions.static int
Timeout (in msec) to drain an input stream that has been closed before the entire HTTP response has been read.The host extracted from the URL used to construct thisHttpRequest
.static int
Maximum length of a line in the HTTP response headers (sanity check).The HTTP method, such as "GET", "POST", or "HEAD".static HttpSocketPool
The cache of idle sockets.int
The port extracted from the URL used to construct thisHttpRequest
.If non-null
, sends this HTTP request via the specified proxy host and port.int
The proxy port.The headers for the HTTP request.The headers that were present in the HTTP response.An artifact of HTTP/1.1 chunked encoding.static SocketFactory
The factory for constructing new Sockets objects used to connect to remote hosts when issuing HTTP requests.The status line from the HTTP response.The URL used to construct thisHttpRequest
.The HTTP version string. -
Constructor Summary
ConstructorsConstructorDescriptionHttpRequest
(String url) Creates a newHttpRequest
object that will send an HTTP request to fetch the resource represented by the URL.HttpRequest
(URL url) Creates a newHttpRequest
object that will send an HTTP request to fetch the resource represented by the URL. -
Method Summary
Modifier and TypeMethodDescriptionint
addHeaders
(String tokens, Properties props) Convenience method for adding request headers by looking them up in a properties object.void
close()
Gracefully closes this HTTP request when user is done with it.void
connect()
Connect to the target host (or proxy), send the request, and read the response headers.void
Interrupts this HTTP request.Return the content as a string.getContent
(String encoding) Get the content as a string.int
Convenience method to get the "Content-Length" header from the HTTP response.Gets an input stream that can be used to read the body of the HTTP response.Gets an output stream that can be used for uploading data to the host.int
Gets the HTTP response status code.getResponseHeader
(String key) Gets the value associated with the given case-insensitive header name from the HTTP response.static void
Grab http document(s) and save them in the filesystem.static void
removePointToPointHeaders
(MimeHeaders headers, boolean response) Removes all the point-to-point (hop-by-hop) headers from the given mime headers.void
Sets the HTTP method to the specified value.void
Sets the proxy for this request.void
setRequestHeader
(String key, String value) Sets a request header in the HTTP request that will be issued.
-
Field Details
-
DRAIN_TIMEOUT
public static int DRAIN_TIMEOUTTimeout (in msec) to drain an input stream that has been closed before the entire HTTP response has been read.If the user closes the
HttpRequest
before reading all of the data, but the remote host has agreed to keep this socket alive, we need to read and discard the rest of the response before issuing a new request. If it takes longer thanDRAIN_TIMEOUT
to read and discard the data, we will just forcefully close the connection to the remote host rather than waiting to read any more.Default value is 10000.
-
LINE_LIMIT
public static int LINE_LIMITMaximum length of a line in the HTTP response headers (sanity check).If an HTTP response line is longer than this, the response is considered to be malformed.
Default value is 1000.
-
defaultHTTPVersion
The default HTTP version string to send to the remote host when issuing requests.The default value can be overridden on a per-request basis by setting the
version
instance variable.Default value is "HTTP/1.1".
- See Also:
-
defaultProxyHost
The default proxy host for HTTP requests. If non-null
, then all new HTTP requests will be sent via this proxy. Ifnull
, then all new HTTP requests are sent directly to the host specified when theHttpRequest
object was constructed.The default value can be overridden on a per-request basis by calling the
setProxy
method or setting theproxyHost
instance variables.Default value is
null
. -
defaultProxyPort
public static int defaultProxyPortThe default proxy port for HTTP requests.Default value is
80
.- See Also:
-
socketFactory
The factory for constructing new Sockets objects used to connect to remote hosts when issuing HTTP requests. The user can set this to provide a new type of socket, such as SSL sockets.Default value is
null
, which signifies plain sockets. -
pool
The cache of idle sockets. Once a request has been handled, the now-idle socket can be remembered and reused later if another HTTP request is made to the same remote host. -
url
The URL used to construct thisHttpRequest
. -
host
The host extracted from the URL used to construct thisHttpRequest
.- See Also:
-
port
public int portThe port extracted from the URL used to construct thisHttpRequest
.- See Also:
-
proxyHost
If non-null
, sends this HTTP request via the specified proxy host and port.Initialized from
defaultProxyHost
, but may be changed by the user at any time up until the HTTP request is actually sent. -
proxyPort
public int proxyPortThe proxy port.- See Also:
-
connected
protected boolean connected -
method
The HTTP method, such as "GET", "POST", or "HEAD".May be set by the user at any time up until the HTTP request is actually sent.
-
version
The HTTP version string.Initialized from
defaultHTTPVersion
, but may be changed by the user at any time up until the HTTP request is actually sent. -
requestHeaders
The headers for the HTTP request. All of these headers will be sent when the connection is actually made. -
displayAllHeaders
public static boolean displayAllHeaderssetting this to "true" causing all http headers to be printed on the standard error stream; useful for debugging client/server interactions. -
status
The status line from the HTTP response. This field is not valid until afterconnect
has been called and the HTTP response has been read. -
responseHeaders
The headers that were present in the HTTP response. This field is not valid until afterconnect
has been called and the HTTP response has been read. -
responseTrailers
An artifact of HTTP/1.1 chunked encoding. At the end of an HTTP/1.1 chunked response, there may be more MimeHeaders. It is only possible to access these MimeHeaders after all the data from the input stream returned bygetInputStream
has been read. At that point, this field will automatically be initialized to the set of any headers that were found. If not reading from an HTTP/1.1 chunked source, then this field is irrelevant and will remainnull
.
-
-
Constructor Details
-
HttpRequest
Creates a newHttpRequest
object that will send an HTTP request to fetch the resource represented by the URL.The host specified by the URL is not contacted at this time.
- Parameters:
url
- A fully qualified "http:" URL.- Throws:
IllegalArgumentException
- ifurl
is not an "http:" URL.
-
HttpRequest
Creates a newHttpRequest
object that will send an HTTP request to fetch the resource represented by the URL.The host specified by the URL is not contacted at this time.
- Parameters:
url
- A string representing a fully qualified "http:" URL.- Throws:
IllegalArgumentException
- ifurl
is not a well-formed "http:" URL.
-
-
Method Details
-
setMethod
Sets the HTTP method to the specified value. Some of the normal HTTP methods are "GET", "POST", "HEAD", "PUT", "DELETE", but the user can set the method to any value desired.If this method is called, it must be called before
connect
is called. Otherwise it will have no effect.- Parameters:
method
- The string for the HTTP method, ornull
to allow thisHttpRequest
to pick the method for itself.
-
setProxy
Sets the proxy for this request. The HTTP proxy request will be sent to the specified proxy host.If this method is called, it must be called before
connect
is called. Otherwise it will have no effect.- Parameters:
proxyHost
- The proxy that will handle the request, ornull
to not use a proxy.proxyPort
- The port on the proxy, for the proxy request. Ignored ifproxyHost
isnull
.
-
setRequestHeader
Sets a request header in the HTTP request that will be issued. In order to do fancier things like appending a value to an existing request header, the user may directly access therequestHeaders
variable.If this method is called, it must be called before
connect
is called. Otherwise it will have no effect.- Parameters:
key
- The header name.value
- The value for the request header.- See Also:
-
getOutputStream
Gets an output stream that can be used for uploading data to the host.If this method is called, it must be called before
connect
is called. Otherwise it will have no effect.Currently the implementation is not as good as it could be. The user should avoid uploading huge amounts of data, for some definition of huge.
- Throws:
IOException
-
connect
Connect to the target host (or proxy), send the request, and read the response headers. Any setup routines must be called before the call to this method, and routines to examine the result must be called after this method.- Throws:
UnknownHostException
- if the target host (or proxy) could not be contacted.IOException
- if there is a problem writing the HTTP request or reading the HTTP response headers.
-
getInputStream
Gets an input stream that can be used to read the body of the HTTP response. Unlike the other convenience methods for accessing the HTTP response, this one automatically connects to the target host if not already connected.The input stream that
getInputStream
provides automatically hides the differences between "Content-Length", no "Content-Length", and "chunked" for HTTP/1.0 and HTTP/1.1 responses. In all cases, the user can simply read until reaching the end of the input stream, which signifies that all the available data from this request has been read. (If reading from a "chunked" source, the data is automatically de-chunked as it is presented to the user. There is no way to access the raw underlying stream that contains the HTTP/1.1 chunking packets.)- Throws:
IOException
- if there is problem connecting to the target.- See Also:
-
close
public void close()Gracefully closes this HTTP request when user is done with it.The user can either call this method or
close
on the input stream obtained from thegetInputStream
method -- the results are the same.When all the response data is read from the input stream, the input stream is automatically closed (recycled). If the user is not going to read all the response data from input stream, the user must call
close
to release the resources associated with the open request. Otherwise the program may consume all available sockets, waiting forever for the user to finish reading.Note that the input stream is automatically closed if the input stream throws an exception while reading.
In order to interrupt a pending I/O operation in another thread (for example, to stop a request that is taking too long), the user should call
disconnect
or interrupt the blocked thread. The user should not callclose
in this case becauseclose
will not interrupt the pending I/O operation.Closing the request multiple times is allowed.
In order to make sure that open sockets are not left lying around the user should use code similar to the following:
OutputStream out = ... HttpRequest http = new HttpRequest("http://bob.com/index.html"); try { HttpInputStream in = http.getInputStream(); in.copyTo(out); } finally { // Copying to "out" could have failed. Close "http" in case // not all the data has been read from it yet. http.close(); }
-
disconnect
public void disconnect()Interrupts this HTTP request. Can be used to halt an in-progress HTTP request from another thread, by causing it to throw anInterruptedIOException
during the connect or while reading from the input stream, depending upon what state this HTTP request is in when it is disconnected.- See Also:
-
getResponseCode
public int getResponseCode()Gets the HTTP response status code. From responses like:HTTP/1.0 200 OK HTTP/1.0 401 Unauthorized
this method extracts the integers200
and401
respectively. Returns-1
if the response status code was malformed.If this method is called, it must be called after
connect
has been called. Otherwise the information is not yet available and this method will return-1
.For advanced features, the user can directly access the
status
variable. -
getResponseHeader
Gets the value associated with the given case-insensitive header name from the HTTP response.If this method is called, it must be called after
connect
has been called. Otherwise the information is not available and this method will returnnull
.For advanced features, such as enumerating over all response headers, the user should directly access the
responseHeaders
variable.- Parameters:
key
- The case-insensitive name of the response header.- Returns:
- The value associated with the given name, or
null
if there is no such header in the response. - See Also:
-
getContentLength
public int getContentLength()Convenience method to get the "Content-Length" header from the HTTP response.If this method is called, it must be called after
connect
has been called. Otherwise the information is not available and this method will return-1
.- Returns:
- The content length specified in the response headers, or
-1
if the length was not specified or malformed (not a number). - See Also:
-
removePointToPointHeaders
Removes all the point-to-point (hop-by-hop) headers from the given mime headers.- Parameters:
headers
- The mime headers to be modified.response
-true
to remove the point-to-point response headers,false
to remove the point-to-point request headers.- See Also:
-
addHeaders
Convenience method for adding request headers by looking them up in a properties object.- Parameters:
tokens
- a white space delimited set of tokens that refer to headers that will be added to the HTTP request.props
- Keys of the form[token].name
and[token].value
are used to lookup additional HTTP headers to be added to the request.- Returns:
- The number of headers added to the request
- See Also:
-
getContent
Get the content as a string. Uses the character encoding specified in the HTTP headers if available. Otherwise the supplied encoding is used, or (if encoding is null), the platform default encoding.- Parameters:
encoding
- The ISO character encoding to use, if the encoding can't be determined by context.- Returns:
- The content as a string.
- Throws:
IOException
UnsupportedEncodingException
-
getContent
Return the content as a string. -
getEncoding
-
main
Grab http document(s) and save them in the filesystem. This is a simple batch HTTP url fetcher. Usage:java ... sunlabs.brazil.request.HttpRequest [-v(erbose)] [-h(headers)] [-p<http://proxyhost:port>] url...
- -v
- Verbose. Print the target URL and destination file on stderr
- -h
- Print all the HTTP headers on stderr
- -phttp://proxyhost:port
- The following url's are to be fetched via a proxy.
There are many limitations: only HTTP GET requests are supported, the output filename is derived autmatically from the URL and can't be overridden, if a destination file already exists, it is overwritten.
- Throws:
Exception
-