One of my favorite (now retired) interview questions is: How would you write a webserver in a shell script? It may surprise you how easy it is. The thing that makes it easy is inetd, the “internet super server”. It wraps a TCP or UDP server around any command line application. Using a bit of dup magic, it makes the incoming connection available to any program via stdin/stdout.
The tricky part is handling the incoming HTTP GET request. Thankfully, it is always the first line that is requested, so this is all you need to do to get that first line:
read request
The request line is now in $request, in the form of “GET /blah HTTP/1.0″. Most people would tell you to use awk or sed to get the filename (/blah) portion out of it, but even the standard bourne shell nowadays has some nifty text manipulation tools built-in. You can do some easy matching for the beginning of a word (#) and the end of a word (%) using parameter expansion. This takes the request, and gives you back the filename requested:
url="${request#GET }"
url="${url% HTTP/*}"
The plus of using the above mechanism to parse your request is that you sanitize the input before any shell expansion occurs. Evil characters such as ` or | don’t take any effect here.
Another tricky part is returning the correct MIME type. I cheat here a bit by relying on a more recent addition to the file command: -bi to return a simplified file type (MIME). This won’t work in Mac OS X as it has an older version, but newer Linux or BSD versions allow you to do this:
% file -bi ~/* writable, regular file, no read permission application/x-not-regular-file text/plain; charset=us-ascii application/x-executable, for FreeBSD 8.0 (800008) image/jpeg image/png application/octet-stream
Without the -bi shortcut, you could easily build a quick table of common file types to MIME types, and search off of it instead. I’ll leave it as an exercise to the reader to implement.
The trickiest off all of the issues is security. Because I use built-in’s for all of the parsing and path checks, I’m hopefully intercepting any bad things before operations are run on them. You can of course suffer directory traversal attacks if you are not careful. I decided on this mechanism to check for them:
chdir $dir || handle_error 404 "File Not Found"
if [ "`echo "$PWD/" | grep "^$DOCUMENT_ROOT/"`" = "" ]; then
handle_error 400 "Bad Request"
fi
For logging, I use the logger(1) tool to output requests and errors to syslog. The client IP isn’t readily available to the shell sccript, but it’s fairly trivial to use lsof to find out what the incoming IP is.
No shell script will ever be ‘secure’, as there are a lot of ways that you can trick them into doing bad things. If you were ever crazy enough to use a shell script as an internet server, I highly recommend running it in a chrooted environment, as another user, within some sort of system call trapping layer like systrace. Here is the full source:
#!/bin/sh # intarweb - a webserver written in bourne shell. (c) 2007 Thomas Stromberg # tested on FreeBSD 8.0-CURRENT DOCUMENT_ROOT="/intarweb" SYSLOG_FACILITY="local0" handle_error() { echo -e "HTTP/1.1 $1 $2\r\nContent-Type: text/html\r\n\r" echo -e "<h1>$1: $2</h1>\r" logger -p ${SYSLOG_FACILITY}.warn "$url returned $1: $2" exit 5 } check_security() { if [ -d $1 ]; then dir=$1 else dir=`dirname $1` fi chdir $dir 2>/dev/null || handle_error 404 "File Not Found" logger -p ${SYSLOG_FACILITY}.info "GET $url file=$1 dir=$dir pwd=$PWD" # Out of DOCUMENT_ROOT. Ack. if [ "`echo "$PWD/" | grep "^$DOCUMENT_ROOT/"`" = "" ]; then handle_error 400 "Bad Request" elif [ ! -r $1 ]; then handle_error 404 "File Not Found" fi } handle_file() { check_security $1 if [ -f $1 ]; then mime=`/usr/bin/file -bi "$1" | cut -d, -f1 | grep "\/"` echo -e "HTTP/1.1 200 OK\r\nContent-Type: ${mime:-text/plain}\r\n\r" cat $1 echo -e "\r" elif [ -d $1 ]; then handle_directory $1 fi } handle_directory() { check_security $1 if [ -f "${1}/index.html" ]; then handle_file "${1}/index.html" else echo -e "HTTP/1.1 200 OK\r\nContent-Type: text/html\r\n\r" echo -e "<h1>Contents of $url</h1><pre>" # TODO: does not work with spaces in filenames. sed gets \/ crazy. echo "<a href=\"..\">(Up a directory)</a>" ls -1 "$1" | xargs -n1 -I{} echo "<a href="$url/{}">{}</a>" echo -e "<" "/pre>\r" fi } # parse HTTP request. GET should always be the first line. read request url="${request#GET }" url="${url% HTTP/*}" handle_file "${DOCUMENT_ROOT}${url}"
You can also download the source from http://stromberg.org/svn/repos/intarweb/intarweb.sh. Credit goes to A web server in a shell script for the expansion and MIME type shortcuts.
Add New Comment
Viewing 1 Comment
Thanks. Your comment is awaiting approval by a moderator.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Add New Comment
Trackbacks
(Trackback URL)