Noodle Technical Manual
Introduction
This document describes how to drive Noodle from a servlet and how
to write Noodle filters. The Noodle servlet will do
everything for you, but it's mainly intended to serve as an
example--unless your needs are very basic, you will probably have to
write your own servlet along the lines of the Noodle
servlet.
Noodle URLs
There are two ways of using the provided Noodle
servlet; these apply to any other servlet you might write that drives
Noodle. The first is easier to explain but requires modifying the
query string, which can lead to conflicts on the other side of Noodle.
http://www.foo.com/noodle/Noodle?page=/noodle/NoodleTest
The Noodle servlet takes the path value from the
page variable and then connects to the host and port
specified in the NoodleResources.properties file, running the
NoodleTest servlet and returning the results. Please look
at the javadoc
for the NoodleTest servlet for more information on what is being
tested and why.
The second approach uses request attributes to pass along the page
information. This approach does not modify the query string, avoiding
conflicts within the query string, and making the query string
available in its original form to resources on the other side of the
Noodle proxy. However, the rules for setting this up are somewhat
servlet container specific. Here's an example using mod_rewrite (for
the URL manipulation layer) and mod_jk with Apache 2.0 and Tomcat 4.
RewriteRule .* - [E=org.tigris.noodle.page:]
RewriteCond %{HTTP_HOST} !=127.0.0.1
RewriteRule (/source/.*) /noodle/Noodle [PT,L,E=org.tigris.noodle.page:$1]
JkEnvVar org.tigris.noodle.page NONE
The first rewrite rule is optional; it allows mixing the two styles
described here, by setting the default value for org.tigris.noodle.page
to the empty string (otherwise, the environment variable will take
precedence over the query string, and the default value -- shown here
as NONE -- will be used). The RewriteCond says that if the remote host
is not localhost, the second RewriteRule takes effect. The second
RewriteRule changes the URL to point to the noodle servlet, and stores
the page to proxy in the
org.tigris.noodle.page Apache
environment variable. The JkEnvVar directive tells mod_jk to pass
along the Apache environment variable org.tigris.noodle.page with a
default value of NONE.
NoodleData
To run a request through Noodle you must create a NoodleData object, call any needed
setup methods on it (setURL,
at a minimum), and then call the proxyRequest method to
set up the proxy and stream to your response.
To construct a NoodleData object you must provide an
HttpServletRequest object and an
HttpServletResponse object. These are the client
request and client response. When you call the
proxyRequest method, Noodle will create a proxy
request and run filters you specify (in the third argument to the
NoodleData constructor) against the resulting proxy
response. The filters you specify will write a (potentially
modified) version of the proxy response to the client response.
You must also provide a Properties object to the
NoodleData constructor, defining the filter set you want to run against the proxy
request and response.
NoodleData allows overrides for methods which are related to determining
how the input stream should be handled. The proxied response can be handled
either as a stream of bytes or as a stream of characters. In the former
case, no encoding transformations will happen; in the latter, Noodle
will attempt to determine the character set / encoding of the proxied
data and will try to convert from that to Unicode (Java's internal format);
then the data will be transformed into bytes using
NoodleData.getOutputCharset(). Filters will see this data as bytes, but
if the data is treated as a character stream, they'll get blocks of bytes
which respect character boundaries -- assuming the proper character set
was determined.
The character set is guessed based on the HTTP Content-type header
in the proxied response, if present. If that's not present, the
character set will be guessed by looking for a <meta
http-equiv="Content-Type" content="text/html; charset=Shift_JIS" />
tag within the begining of the response. If neither is found, Noodle
will default to using ISO-8859-1 for both input and output character
sets (not that this means that NoodleData.getOutputCharset() will not
be used in this case).
NoodleData subclasses can override this behavior entirely by overriding
getReader, or they can simply determine whether or not to use a character
stream by overriding useCharacterReader. By default, Noodle will use a
character stream for text/html data, and a byte stream for all other kinds
of input.
Filter sets
The NoodleData constructor takes as its third argument
a Properties object containing a definition of the filter
set you want applied to the proxied response. The
Properties fragment you pass in should look something
like this:
filter.request.0=com.me.filter.MyFirstRequestFilter
filter.request.1=com.me.filter.MySecondRequestFilter
filter.response.0=org.tigris.noodle.filter.CheckForRedirect
filter.response.1=org.tigris.noodle.filter.HandleContentType
filter.response.2=org.tigris.noodle.filter.CopyCookies
filter.response.3=org.tigris.noodle.filter.CopyAllHeaders
filter.response.4=com.me.filter.MyFirstResponseFilter
filter.response.5=com.me.filter.MySecondResponseFilter
filter.response.finalizer.0=com.me.MyFirstFinalizerFilter
filter.response.finalizer.1=com.me.MySecondFinalizerFilter
filter.request.[ordinal] designates a proxy request
filter. filter.response.[ordinal] designates a proxy response filter.
filter.response.finalizer.[ordinal] designates a proxy response finalizer filter.. Filters of
a certain type will be executed in the order given by their ordinals
in the file.
Setup Methods
You can call any of these NoodleData methods before
calling proxyRequest, or within the filter
method of any NoodleRequestFilter. Calling these methods
in a response filter will not do you any
good, since by the time response filters get run the proxy connection
has already been established and the request has been sent off.
- You can change the URL being used to access the proxy using the
setURL method.
- You can set the HTTP headers to be sent to the proxy using the
setHeadersToSend method.
- You can set the POST data to be sent to the proxy using the
setPostData method, and set the GET data to be sent to
the proxy using the setQueryData method.
Noodle Filters
Filters are the meat of Noodle; they allow you to change the proxy
request and response in arbitrary ways invisible to the client. There
are three types of filters: proxy request
filters, proxy response filters, and
proxy response finalizer
filters.
Filters are executed by NoodleData in the order
specified in the filter set.
Note that, like servlets, filters must be be implemented in a
thread safe manner. If your filter has any state that needs to persist
between data chunks, you can store it and retrieve it later by using
the getValue and putValue methods of
NoodleData.
Sample filters
Several proxy response filters are distributed with Noodle to serve
as examples. Together with Noodle, these filters provide almost all of
the functionality of a standard HTTP proxy for GET and POST requests.
CheckForRedirect will propagate a redirect from the
proxy response into the client response.
CopyCookies will copy over Set-Cookie headers from
the proxy response to the client response.
CopyHeaders will copy all HTTP headers from the proxy
response to the client response.
HandleContentType will propagate the content type of
the proxy response to the client response, and will disable further
filtering if the content type is not "text/html".
Request filters
Request filters should implement the
NoodleRequestFilter interface. They have their
filter methods called once, before the proxy request is
sent. There is no data to stream yet, since there hasn't yet been a
response from the proxy. However, you can call any of the setup methods from within a request filter.
Response filters
Response filters should implement the
NoodleResponseFilter interface. A response filter is
called after the proxy request has been sent, once for every block of
data read from the proxy response, until the filter elects not to read
any more data.
Streaming data
The current data block is provided through the filter method; it is
an instance of ResponseBlock. It will contain a byte array with the bytes,
a length, and a String containing the expected encoding of the data (if
known; if not, the String will be null).
For byte input streams, the block of bytes will be a regular size --
0-8192, with most being 4096 or 4097-8192. There will be less than 4096
iff the response contains less than 4096 bytes; otherwise all blocks will
be 4096, until the last one, which will be 4097-8192. For character
streams, the size may be more irregular, as characters are read in fixed
sizes and then transformed into bytes.
Your NoodleFilter is responsible for writing to the
client response all the data it wants to write. If the data it wants
to write includes or replaces the data currently in the ResponseBlock,
then your NoodleFilter is also responsible for updating the
offset of that ResponseBlock to the offset of the first byte not
streamed by your NoodleFilter (if you handle the entire
ResponseBlock this will be the output of
ResponseBlock.getLength()). There is a
method called streamBlockTo which will take care of this
for you in simple cases.
Example: Suppose you have a filter whose tedious task it is to
replace the "p" character with the "q" character whenever it
appears. Let's further say the ResponseBlock block has a byte array
which contains the following 30 bytes:
012345678901234567890123456789
Mahnamahna (bop be ba de bop!)
Your NoodleFilter should call
streamBufferTo(block,13), which will write "Mahnamahna (bo" to
the client request and set the current offset (the first unstreamed
byte) to 14. You should then write a "q" byte to replace the "p" and
manually increment block's offset by one via
block.setOffset(block.getOffset() + 1). A call to
streamBufferTo(block,26), will write " be da be bo" to the client request
and set block's offset to 27. You should then write another "q" byte,
increment block's offset again, and return (since there are no more
"p" characters; see below for what to return). You should not stream
more bytes than you need to; if there are any unstreamed bytes after
all the filters have run on a block, Noodle will automatically stream
the remainder.
If you need to do things with the input which can't be done with
byte streams, such as regular expression handling, you are of course
free to turn the byte stream into a string (you should use the
encoding specified with the bytes if possible -- block.getEncoding()),
modify it, and write the modified string out to the response. Just be
sure to update the block's offset to point to the first byte you're not
handling.
Filter Status Codes
Your implementation of filter needs to return one of
three status codes defined in NoodleFilter so that Noodle
knows what to do with the filter on the next block.
- MAINTAIN_THIS_FILTER: Do not change the state of this filter. It
will be called on the next block of data to be streamed, assuming
there is one.
- KILL_THIS_FILTER: Do not run this filter on any more data.
- KILL_ALL_FILTERS: Disable filtering for the remainder of this
request. All unstreamed bytes in the proxy response will be streamed
directly to the client response without going through any filters
(even finalizer filters).
Response finalizer filters
These are just like reponse filters (they must implement
NoodleResponseFilter) except that they are run only once,
on the last block of data read from the proxy response. They are set
loose on the last block after all the regular response filters (if
there are any still alive) have had a shot at it. Whatever value they
return from filter is ignored.
Good luck!
You are now ready to create your own Noodle filters! Have fun!
To learn more about Noodle in general, please see the
Noodle Documentation.