Login | Register
My pages Projects Community openCollabNet

Noodle Technical Manual

Introduction

This document describes how to drive Noodle from a servlet and how to write Noodle filters. The Noodle servlet will do everything for you, but it's mainly intended to serve as an example--unless your needs are very basic, you will probably have to write your own servlet along the lines of the Noodle servlet.

Noodle URLs

There are two ways of using the provided Noodle servlet; these apply to any other servlet you might write that drives Noodle. The first is easier to explain but requires modifying the query string, which can lead to conflicts on the other side of Noodle.

http://www.foo.com/noodle/Noodle?page=/noodle/NoodleTest

The Noodle servlet takes the path value from the page variable and then connects to the host and port specified in the NoodleResources.properties file, running the NoodleTest servlet and returning the results. Please look at the javadoc for the NoodleTest servlet for more information on what is being tested and why.

The second approach uses request attributes to pass along the page information. This approach does not modify the query string, avoiding conflicts within the query string, and making the query string available in its original form to resources on the other side of the Noodle proxy. However, the rules for setting this up are somewhat servlet container specific. Here's an example using mod_rewrite (for the URL manipulation layer) and mod_jk with Apache 2.0 and Tomcat 4.

RewriteRule .* - [E=org.tigris.noodle.page:] RewriteCond %{HTTP_HOST} !=127.0.0.1 RewriteRule (/source/.*) /noodle/Noodle [PT,L,E=org.tigris.noodle.page:$1] JkEnvVar org.tigris.noodle.page NONE
The first rewrite rule is optional; it allows mixing the two styles described here, by setting the default value for org.tigris.noodle.page to the empty string (otherwise, the environment variable will take precedence over the query string, and the default value -- shown here as NONE -- will be used). The RewriteCond says that if the remote host is not localhost, the second RewriteRule takes effect. The second RewriteRule changes the URL to point to the noodle servlet, and stores the page to proxy in the org.tigris.noodle.page Apache environment variable. The JkEnvVar directive tells mod_jk to pass along the Apache environment variable org.tigris.noodle.page with a default value of NONE.

NoodleData

To run a request through Noodle you must create a NoodleData object, call any needed setup methods on it (setURL, at a minimum), and then call the proxyRequest method to set up the proxy and stream to your response.

To construct a NoodleData object you must provide an HttpServletRequest object and an HttpServletResponse object. These are the client request and client response. When you call the proxyRequest method, Noodle will create a proxy request and run filters you specify (in the third argument to the NoodleData constructor) against the resulting proxy response. The filters you specify will write a (potentially modified) version of the proxy response to the client response.

You must also provide a Properties object to the NoodleData constructor, defining the filter set you want to run against the proxy request and response.

NoodleData allows overrides for methods which are related to determining how the input stream should be handled. The proxied response can be handled either as a stream of bytes or as a stream of characters. In the former case, no encoding transformations will happen; in the latter, Noodle will attempt to determine the character set / encoding of the proxied data and will try to convert from that to Unicode (Java's internal format); then the data will be transformed into bytes using NoodleData.getOutputCharset(). Filters will see this data as bytes, but if the data is treated as a character stream, they'll get blocks of bytes which respect character boundaries -- assuming the proper character set was determined.

The character set is guessed based on the HTTP Content-type header in the proxied response, if present. If that's not present, the character set will be guessed by looking for a <meta http-equiv="Content-Type" content="text/html; charset=Shift_JIS" /> tag within the begining of the response. If neither is found, Noodle will default to using ISO-8859-1 for both input and output character sets (not that this means that NoodleData.getOutputCharset() will not be used in this case).

NoodleData subclasses can override this behavior entirely by overriding getReader, or they can simply determine whether or not to use a character stream by overriding useCharacterReader. By default, Noodle will use a character stream for text/html data, and a byte stream for all other kinds of input.

Filter sets

The NoodleData constructor takes as its third argument a Properties object containing a definition of the filter set you want applied to the proxied response. The Properties fragment you pass in should look something like this:

filter.request.0=com.me.filter.MyFirstRequestFilter
filter.request.1=com.me.filter.MySecondRequestFilter
filter.response.0=org.tigris.noodle.filter.CheckForRedirect
filter.response.1=org.tigris.noodle.filter.HandleContentType
filter.response.2=org.tigris.noodle.filter.CopyCookies
filter.response.3=org.tigris.noodle.filter.CopyAllHeaders
filter.response.4=com.me.filter.MyFirstResponseFilter
filter.response.5=com.me.filter.MySecondResponseFilter
filter.response.finalizer.0=com.me.MyFirstFinalizerFilter
filter.response.finalizer.1=com.me.MySecondFinalizerFilter

filter.request.[ordinal] designates a proxy request filter. filter.response.[ordinal] designates a proxy response filter. filter.response.finalizer.[ordinal] designates a proxy response finalizer filter.. Filters of a certain type will be executed in the order given by their ordinals in the file.

Setup Methods

You can call any of these NoodleData methods before calling proxyRequest, or within the filter method of any NoodleRequestFilter. Calling these methods in a response filter will not do you any good, since by the time response filters get run the proxy connection has already been established and the request has been sent off.

  1. You can change the URL being used to access the proxy using the setURL method.
  2. You can set the HTTP headers to be sent to the proxy using the setHeadersToSend method.
  3. You can set the POST data to be sent to the proxy using the setPostData method, and set the GET data to be sent to the proxy using the setQueryData method.

Noodle Filters

Filters are the meat of Noodle; they allow you to change the proxy request and response in arbitrary ways invisible to the client. There are three types of filters: proxy request filters, proxy response filters, and proxy response finalizer filters.

Filters are executed by NoodleData in the order specified in the filter set.

Note that, like servlets, filters must be be implemented in a thread safe manner. If your filter has any state that needs to persist between data chunks, you can store it and retrieve it later by using the getValue and putValue methods of NoodleData.

Sample filters

Several proxy response filters are distributed with Noodle to serve as examples. Together with Noodle, these filters provide almost all of the functionality of a standard HTTP proxy for GET and POST requests.

  • CheckForRedirect will propagate a redirect from the proxy response into the client response.
  • CopyCookies will copy over Set-Cookie headers from the proxy response to the client response.
  • CopyHeaders will copy all HTTP headers from the proxy response to the client response.
  • HandleContentType will propagate the content type of the proxy response to the client response, and will disable further filtering if the content type is not "text/html".

Request filters

Request filters should implement the NoodleRequestFilter interface. They have their filter methods called once, before the proxy request is sent. There is no data to stream yet, since there hasn't yet been a response from the proxy. However, you can call any of the setup methods from within a request filter.

Response filters

Response filters should implement the NoodleResponseFilter interface. A response filter is called after the proxy request has been sent, once for every block of data read from the proxy response, until the filter elects not to read any more data.

Streaming data

The current data block is provided through the filter method; it is an instance of ResponseBlock. It will contain a byte array with the bytes, a length, and a String containing the expected encoding of the data (if known; if not, the String will be null).

For byte input streams, the block of bytes will be a regular size -- 0-8192, with most being 4096 or 4097-8192. There will be less than 4096 iff the response contains less than 4096 bytes; otherwise all blocks will be 4096, until the last one, which will be 4097-8192. For character streams, the size may be more irregular, as characters are read in fixed sizes and then transformed into bytes.

Your NoodleFilter is responsible for writing to the client response all the data it wants to write. If the data it wants to write includes or replaces the data currently in the ResponseBlock, then your NoodleFilter is also responsible for updating the offset of that ResponseBlock to the offset of the first byte not streamed by your NoodleFilter (if you handle the entire ResponseBlock this will be the output of ResponseBlock.getLength()). There is a method called streamBlockTo which will take care of this for you in simple cases.

Example: Suppose you have a filter whose tedious task it is to replace the "p" character with the "q" character whenever it appears. Let's further say the ResponseBlock block has a byte array which contains the following 30 bytes:

012345678901234567890123456789
Mahnamahna (bop be ba de bop!)

Your NoodleFilter should call streamBufferTo(block,13), which will write "Mahnamahna (bo" to the client request and set the current offset (the first unstreamed byte) to 14. You should then write a "q" byte to replace the "p" and manually increment block's offset by one via block.setOffset(block.getOffset() + 1). A call to streamBufferTo(block,26), will write " be da be bo" to the client request and set block's offset to 27. You should then write another "q" byte, increment block's offset again, and return (since there are no more "p" characters; see below for what to return). You should not stream more bytes than you need to; if there are any unstreamed bytes after all the filters have run on a block, Noodle will automatically stream the remainder.

If you need to do things with the input which can't be done with byte streams, such as regular expression handling, you are of course free to turn the byte stream into a string (you should use the encoding specified with the bytes if possible -- block.getEncoding()), modify it, and write the modified string out to the response. Just be sure to update the block's offset to point to the first byte you're not handling.

Filter Status Codes

Your implementation of filter needs to return one of three status codes defined in NoodleFilter so that Noodle knows what to do with the filter on the next block.

  • MAINTAIN_THIS_FILTER: Do not change the state of this filter. It will be called on the next block of data to be streamed, assuming there is one.
  • KILL_THIS_FILTER: Do not run this filter on any more data.
  • KILL_ALL_FILTERS: Disable filtering for the remainder of this request. All unstreamed bytes in the proxy response will be streamed directly to the client response without going through any filters (even finalizer filters).

Response finalizer filters

These are just like reponse filters (they must implement NoodleResponseFilter) except that they are run only once, on the last block of data read from the proxy response. They are set loose on the last block after all the regular response filters (if there are any still alive) have had a shot at it. Whatever value they return from filter is ignored.

Good luck!

You are now ready to create your own Noodle filters! Have fun!


To learn more about Noodle in general, please see the Noodle Documentation.