Apache 2.0 Filters Greg Ames Jeff Trawick. Agenda ● Why filters? ● Filter data structures and utilites ● An example Apache filter ● Filter types ● Configuration.

Slides:



Advertisements
Similar presentations
Chapter 6 Server-side Programming: Java Servlets
Advertisements

HTTP and Apache Roy T. Fielding eBuilt, Inc. The Apache Software Foundation
Introduction to C Programming
Operating system services Program execution I/O operations File-system manipulation Communications Error detection Resource allocation Accounting Protection.
Apache Web Server v. 2.2 Reference Manual Chapter 5 Filters.
Browsers and Servers CGI Processing Model ( Common Gateway Interface ) © Norman White, 2013.
04/14/2008CSCI 315 Operating Systems Design1 I/O Systems Notice: The slides for this lecture have been largely based on those accompanying the textbook.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Lecture 02CS311 – Operating Systems 1 1 CS311 – Lecture 02 Outline UNIX/Linux features – Redirection – pipes – Terminating a command – Running program.
I/O Systems CSCI 444/544 Operating Systems Fall 2008.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
CGI Programming Languages Web Based Software Development July 21, 2005 Song, JaeHa.
IT PUTS THE ++ IN C++ Object Oriented Programming.
Christopher M. Pascucci Basic Structural Concepts of.NET Browser – Server Interaction.
CVSQL 2 The Design. System Overview System Components CVSQL Server –Three network interfaces –Modular data source provider framework –Decoupled SQL parsing.
Lighttpd & Modcache 2009/06/28. Basic lighttpd info Event-driven, single process Event-driven, single process Uses non-block io (network) + writev (memory)
Overview A plain HTML document is static A CGI program is executed in real-time, so that it can output dynamic information. CGI (Common Gateway Interface)
Comp2513 Forms and CGI Server Applications Daniel L. Silver, Ph.D.
Chapter 33 CGI Technology for Dynamic Web Documents There are two alternative forms of retrieving web documents. Instead of retrieving static HTML documents,
Maryam Elahi University of Calgary – CPSC 441.  HTTP stands for Hypertext Transfer Protocol.  Used to deliver virtually all files and other data (collectively.
GDT V5 Web Services. GDT V5 Web Services Doug Evans and Detlef Lexut GDT 2008 International User Conference August 10 – 13  Lake Las Vegas, Nevada GDT.
Hardware Definitions –Port: Point of connection –Bus: Interface Daisy Chain (A=>B=>…=>X) Shared Direct Device Access –Controller: Device Electronics –Registers:
I/O Systems I/O Hardware Application I/O Interface
LiveCycle Data Services Introduction Part 2. Part 2? This is the second in our series on LiveCycle Data Services. If you missed our first presentation,
1 In the good old days... Years ago… the WWW was made up of (mostly) static documents. –Each URL corresponded to a single file stored on some hard disk.
1 Apache. 2 Module - Apache ♦ Overview This module focuses on configuring and customizing Apache web server. Apache is a commonly used Hypertext Transfer.
10/13/2015 ©2006 Scott Miller, University of Victoria 1 Content Serving Static vs. Dynamic Content Web Servers Server Flow Control Rev. 2.0.
The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1.
Chapter 6 Server-side Programming: Java Servlets
Chapter 13: I/O Systems. 13.2/34 Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware.
Form Data Encoding GET – URL encoded POST – URL encoded
Copyright © 2012 UNICOM Systems, Inc. Confidential Information z/Ware Product Overview illustro Systems International A Division of UNICOM Global.
Netprog 2002 CGI Programming1 CGI Programming CLIENT HTTP SERVER CGI Program http request http response setenv(), dup(), fork(), exec(),...
AxKit A member of the Apache XML project Ryan Maslyn Kyle Bechtel.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem.
CE Operating Systems Lecture 17 File systems – interface and implementation.
OOSSE Week 8 JSP models Format of lecture: Assignment context JSP models JSPs calling other JSPs i.e. breaking up work Parameter passing JSPs with Add.
Web Server Design Week 7 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein 2/24/10.
JSP BASICS AND ARCHITECTURE. Goals of JSP Simplify Creation of dynamic pages. Separate Dynamic and Static content.
Silberschatz, Galvin and Gagne  Operating System Concepts Six Step Process to Perform DMA Transfer.
File Systems cs550 Operating Systems David Monismith.
Web Services from 10,000 feet Part I Tom Perkins NTPCUG CertSIG XML Web Services.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
EE 122: Lecture 21 (HyperText Transfer Protocol - HTTP) Ion Stoica Nov 20, 2001 (*)
Traffic Server APIs Brian Geffon LinkedIn Nov 16, 2015.
Accelerating PHP Applications Ilia Alshanetsky O’Reilly Open Source Convention August 3rd, 2005.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
Silberschatz, Galvin, and Gagne  Applied Operating System Concepts Module 12: I/O Systems I/O hardwared Application I/O Interface Kernel I/O.
FILE SYSTEM IMPLEMENTATION 1. 2 File-System Structure File structure Logical storage unit Collection of related information File system resides on secondary.
Session 11: Cookies, Sessions ans Security iNET Academy Open Source Web Development.
Today… Modularity, or Writing Functions. Winter 2016CISC101 - Prof. McLeod1.
LINUXCHIX WEBMAIL. Software run by an ISP or online service that provides access to send, receive, and review using only your Web browser. Users.
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
Lesson 11. CGI CGI is the interface between a Web page or browser and a Web server that is running a certain program/script. The CGI (Common Gateway Interface)
Chapter 13: I/O Systems.
Module 12: I/O Systems I/O hardware Application I/O Interface
Web Protocols and Practice
Apache 2.0 Filters Greg Ames Jeff Trawick.
I/O Systems I/O Hardware Application I/O Interface
Operating System Concepts
13: I/O Systems I/O hardwared Application I/O Interface
CS703 - Advanced Operating Systems
EE 122: HyperText Transfer Protocol (HTTP)
Lecture 5: Functions and Parameters
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Lecture 12 Input/Output (programmer view)
Module 12: I/O Systems I/O hardwared Application I/O Interface
Presentation transcript:

Apache 2.0 Filters Greg Ames Jeff Trawick

Agenda ● Why filters? ● Filter data structures and utilites ● An example Apache filter ● Filter types ● Configuration directives

Agenda cont... ● Frequently used Apache filters – Output filters – Input filters ● Pitfalls – ways to avoid them ● mod_ext_filter ● Debugging hints ● The big filter list

Why filters? ● Same idea as Unix command line filters: ps ax | grep "apache.*httpd" | wc -l ● Allows independent, modular manipulations of HTTP data stream – if a 1.3 CGI creates SSI or PHP tags, they aren't parsed – CGI created SSI tags can be parsed in 2.0 – possible Zend issue w/PHP at present

The general idea data can be manipulated independently from how it's generated SSI gzip

Bucket brigades a complex data stream that can be passed thru layered I/O without unnecessary copying SPECWeb99 uses dynamically generated headers and trailers around static files header and trailer live in memory based buckets file bucket contains the fd End Of Stream metadata bucket is the terminator

Filter utilities and structures ● ap_register_[input|output]_filter – creates ap_filter_rec_t ● ap_add_[input|output]_filter[_handle] – creates ap_filter_t ● ap_pass_brigade – passes a bucket brigade to the next output filter ● ap_get_brigade – gets a bucket brigade from the next input filter

ap_filter_t struct ap_filter_t { ap_filter_rec_t *frec; <- description void *ctx; <- context (instance variables) ap_filter_t *next; request_rec *r; <- HTTP request info conn_rec *c; <- connection info }; created by ap_add_[input|output]_filter[_handle] the blue boxes on previous slide

ap_filter_rec_t

An example Apache filter ● mod_case_filter ● lives in modules/experimental ● mission: convert data to upper case

mod_case_filter static apr_status_t CaseFilterOutFilter(ap_filter_t *f, apr_bucket_brigade *pbbIn) { request_rec *r = f->r; conn_rec *c = r->connection; apr_bucket *pbktIn; apr_bucket_brigade *pbbOut; pbbOut=apr_brigade_create(r->pool, c->bucket_alloc);

mod_case_filter... APR_BRIGADE_FOREACH(pbktIn,pbbIn) <-- iterates thru all the buckets { const char *data; apr_size_t len; char *buf; apr_size_t n; apr_bucket *pbktOut; if(APR_BUCKET_IS_EOS(pbktIn)) { /* terminate output brigade */ apr_bucket *pbktEOS=apr_bucket_eos_create(c->bucket_alloc); APR_BRIGADE_INSERT_TAIL(pbbOut,pbktEOS); continue; }

mod_case_filter... /* read */ <-- morphs bucket into memory based type apr_bucket_read(pbktIn,&data,&len,APR_BLOCK_READ); /* write */ buf = apr_bucket_alloc(len, c->bucket_alloc); <-- allocates buffer for(n=0 ; n < len ; ++n) buf[n] = apr_toupper(data[n]); pbktOut = apr_bucket_heap_create(buf, len, apr_bucket_free, <-- creates bucket c->bucket_alloc); APR_BRIGADE_INSERT_TAIL(pbbOut,pbktOut); <-- adds to output brigade } return ap_pass_brigade(f->next,pbbOut); <-- passes brigade to next output filter }

Filter types ● determines where filter is inserted ● AP_FTYPE_RESOURCE (SSI, PHP, case filter) ● AP_FTYPE_CONTENT_SET (deflate, cache) ● AP_FTYPE_PROTOCOL (HTTP) ● AP_FTYPE_ TRANSCODE (chunk) ● AP_FTYPE_NETWORK (core_in, core_out) ● see include/util_filter.h for details

Filter configuration directives

SetOutputFilter ● activates the named filter in this scope: SetOutputFilter INCLUDES ● SetInputFilter is the same for input

AddOutputFilter ● like SetOutputFilter, but uses extension: AddOutputFilter INCLUDES;DEFLATE html ● AddInputFilter is the same for input

RemoveOutputFilter ● resets filter to extension mappings RemoveOutputFilter html removes.html filters in this subdirectory ● RemoveInputFilter is the same for input

Frequently used Apache filters

Output filters <-- removes itself if no Range: header <-- generates headers from headers_out table <<-- writes to the network <-- calculates total content length, if practical

Input filters <-- sets socket timeouts <-- reads socket buckets; length cop

Pitfalls

Excessive memory consumption ● what will happen if this filter is fed a 2G file and the client has a 9600 bps modem? ● solution: – limit your buffer sizes, or don't buffer – call ap_pass_brigade periodically ● ap_pass_brigade will block when socket buffers are full and downstream filters are well behaved

Holding up streaming output ● a CGI writes a "search in progress" message ●...then does a lengthy database query ● filters are called before all the output exists ● filter sees a pipe bucket ● naïve approach will block reading the pipe ● client won't see the message

Holding up streaming output... Specific solution (stolen from protocol.c::ap_content_length_filter): read_type = nonblocking; for each bucket e in the input brigade:. if e is eos bucket, we're done;. rv = e->read();. read_type = non_blocking;. if rv is APR_SUCCESS then process the data received;. if rv is APR_EAGAIN:.. add flush bucket to end of brigade containing data already processed;.. ap_pass_brigade(data already processed);.. read_type = blocking;. if rv is anything else:.. log error and return failure; ap_pass_brigade(data already processed);

Avoiding the pitfalls with a simple well-behaved output filter Make sure that the filter doesn't: ● consume too much virtual memory by touching a lot of storage before passing partial results to the next filter ● break streaming output by always doing blocking reads on pipe buckets ● break streaming output by not sending down a flush bucket when the filter discovers that no more output is available for a while ● busy-loop by always doing non-blocking reads on pipe buckets

Design for simple well-behaved output filter read_mode = nonblock; output brigade = empty; output bytes = 0; foreach bucket in input brigade:. if eos:.. move bucket to end of output brigade; ap_pass_brigade();.. return;. if flush:.. move bucket to end of output brigade; ap_pass_brigade();.. output brigade = empty; output bytes = 0;.. continue with next bucket. read bucket;. if rv is APR_SUCCESS:.. read_mode = non_block;.. process_bucket(); /* see next slide */. if rv is EAGAIN:.. add flush bucket to end of output brigade; ap_pass_brigade();.. output brigade = empty; output bytes = 0;.. read_mode = block;. if output bytes > 8K:.. ap_pass_brigade();.. output brigade = empty; output bytes = 0; ap_pass_brigade();

Design for simple well-behaved output filter... process_bucket: if len is 0:. move bucket to output brigade; while len > 0:. n = min(len, 8K);. get new buffer to hold output of processing n bytes;. process the data;. get heap bucket to represent new buffer;. add heap bucket to end of output brigade;. if output bytes > 8K:.. ap_pass_brigade();.. output brigade = empty; output bytes = 0;. len = len – n;

mod_ext_filter ● Allows command-line filters to act as Apache filters ● Useful tool when implementing native Apache filters – Quick prototype of desired function – Can trace a native filter being tested

Simple mod_ext_filter example "tidy" up the HTML ExtFilterDefine tidy-filter \ cmd="/usr/local/bin/tidy" SetOutputFilter tidy-filter ExtFilterOptions LogStderr

mod_ext_filter and prototyping Use a normal Unix filter to transform the response body mod_ext_filter: perform some header transformations mod_headers: add any required HTTP header fields mod_setenvif: set environment variables to enable/disable the filter

mod_ext_filter header transformations Content-Type set to value of outtype parameter (unchanged otherwise) ExtFilterDefine foo outtype=xxx/yyy Content-Length preserved or removed (default), based on the presence of preservescontentlength parameter ExtFilterDefine foo preservescontentlength

Using mod_header to add headers ExtFilterDefine gzip mode=output \ cmd=/bin/gzip SetOutputFilter gzip Header set Content-Encoding gzip

Enabling a filter via envvar ExtFilterDefine nodirtywords \ cmd="/bin/sed s/damn/darn/g" \ EnableEnv=sanitize_output SetOutputFilter nodirtywords SetEnvIf Remote_Host ceo.mycompany.com \ sanitize_output

Tracing another filter In httpd.conf: # Trace the data read and written by another filter # Trace the input: ExtFilterDefine tracebefore EnableEnv=trace_this_client \ cmd="/usr/bin/tee /tmp/tracebefore" # Trace the output: ExtFilterDefine traceafter EnableEnv=trace_this_client \ cmd="/usr/bin/tee /tmp/traceafter" SetEnvIf Remote_Addr trace_this_client SetOutputFilter tracebefore;some_other_filter;traceafter

General Apache debugging trick ● use the prefork MPM w/ 2 Listen statements ● ps axO ppid,wchan | grep httpd – The process to get the next connection is in poll – It will look unique (Linux: do_pol / schedu) ● attach debugger...gdb bin/httpd ● easy to script ● saves having to restart with -x ● can quickly detach/reattach

Filter debugging hints ●.gdbinit – dump_filters – dump_brigade – dump_bucket ● Breakpoints – core_[input|output]_filter – ap_[get|pass]_brigade

The big filter list ● doesn't include filters already covered ● mod_cache – has three filters – based on 1.3 mod_proxy cache ● mod_charset_lite – translates character encodings – essential on EBCDIC platforms

big list... ● mod_deflate – gzip/ungzip ● mod_include – SSI ● CHUNK – manages HTTP outbound chunking ● mod_logio – allows logging of total bytes sent/recv'd, incl. headers

big list... ● mod_header – lets you set arbitrary HTTP headers ● proxy_ftp – filter to send HTML directory listing ● mod_ssl – low level interfaces to SSL libraries

big list... ● mod_bucketeer – debugging tool for filters/buckets/brigades – separates, flushes, and passes data stream – uses control characters to specify where ● subrequest filter – eats EOS bucket used to end subreq output stream ● old_write – feeds buffered output of ap_rput* into filter chain