sprocket i/o

thomas stromberg on technology, nature, and motorcycles

sprocket i/o header image 2

Writing a web server in bourne shell

March 5th, 2008 · Comments

One of my favorite (now retired) interview questions is: How would you write a webserver in a shell script? It may surprise you how easy it is. The thing that makes it easy is inetd, the “internet super server”. It wraps a TCP or UDP server around any command line application. Using a bit of dup magic, it makes the incoming connection available to any program via stdin/stdout.

The tricky part is handling the incoming HTTP GET request. Thankfully, it is always the first line that is requested, so this is all you need to do to get that first line:

read request

The request line is now in $request, in the form of “GET /blah HTTP/1.0″. Most people would tell you to use awk or sed to get the filename (/blah) portion out of it, but even the standard bourne shell nowadays has some nifty text manipulation tools built-in. You can do some easy matching for the beginning of a word (#) and the end of a word (%) using parameter expansion. This takes the request, and gives you back the filename requested:

url="${request#GET }"
url="${url% HTTP/*}"

The plus of using the above mechanism to parse your request is that you sanitize the input before any shell expansion occurs. Evil characters such as ` or | don’t take any effect here.

Another tricky part is returning the correct MIME type. I cheat here a bit by relying on a more recent addition to the file command: -bi to return a simplified file type (MIME). This won’t work in Mac OS X as it has an older version, but newer Linux or BSD versions allow you to do this:

% file -bi ~/*
writable, regular file, no read permission
application/x-not-regular-file
text/plain; charset=us-ascii
application/x-executable, for FreeBSD 8.0 (800008)
image/jpeg
image/png
application/octet-stream

Without the -bi shortcut, you could easily build a quick table of common file types to MIME types, and search off of it instead. I’ll leave it as an exercise to the reader to implement.

The trickiest off all of the issues is security. Because I use built-in’s for all of the parsing and path checks, I’m hopefully intercepting any bad things before operations are run on them. You can of course suffer directory traversal attacks if you are not careful. I decided on this mechanism to check for them:

  chdir $dir || handle_error 404 "File Not Found"
  if [ "`echo "$PWD/" | grep "^$DOCUMENT_ROOT/"`" = "" ]; then
    handle_error 400 "Bad Request"
  fi

For logging, I use the logger(1) tool to output requests and errors to syslog. The client IP isn’t readily available to the shell sccript, but it’s fairly trivial to use lsof to find out what the incoming IP is.

No shell script will ever be ‘secure’, as there are a lot of ways that you can trick them into doing bad things. If you were ever crazy enough to use a shell script as an internet server, I highly recommend running it in a chrooted environment, as another user, within some sort of system call trapping layer like systrace. Here is the full source:

#!/bin/sh
# intarweb - a webserver written in bourne shell. (c) 2007 Thomas Stromberg
# tested on FreeBSD 8.0-CURRENT
DOCUMENT_ROOT="/intarweb"
SYSLOG_FACILITY="local0"
 
handle_error() {
  echo -e "HTTP/1.1 $1 $2\r\nContent-Type: text/html\r\n\r"
  echo -e "<h1>$1: $2</h1>\r"
  logger -p ${SYSLOG_FACILITY}.warn "$url returned $1: $2"
  exit 5
}
 
check_security() {
  if [ -d $1 ]; then
    dir=$1
  else 
    dir=`dirname $1`
  fi
  chdir $dir 2>/dev/null || handle_error 404 "File Not Found"
  logger -p ${SYSLOG_FACILITY}.info "GET $url file=$1 dir=$dir pwd=$PWD"
  # Out of DOCUMENT_ROOT. Ack.
  if [ "`echo "$PWD/" | grep "^$DOCUMENT_ROOT/"`" = "" ]; then
    handle_error 400 "Bad Request"
  elif [ ! -r $1 ]; then
    handle_error 404 "File Not Found"
  fi
}
 
handle_file() {
  check_security $1
  if [ -f $1 ]; then
    mime=`/usr/bin/file -bi "$1" | cut -d, -f1 | grep "\/"`
    echo -e "HTTP/1.1 200 OK\r\nContent-Type: ${mime:-text/plain}\r\n\r"
    cat $1
    echo -e "\r"
  elif [ -d $1 ]; then
    handle_directory $1
  fi
}
 
handle_directory() {
  check_security $1
  if [ -f "${1}/index.html" ]; then
    handle_file "${1}/index.html"
  else 
    echo -e "HTTP/1.1 200 OK\r\nContent-Type: text/html\r\n\r"
    echo -e "<h1>Contents of $url</h1><pre>"
    # TODO: does not work with spaces in filenames. sed gets \/ crazy.
    echo "<a href=\"..\">(Up a directory)</a>"
    ls -1 "$1" | xargs -n1 -I{} echo "<a href="$url/{}">{}</a>"
    echo -e "<" "/pre>\r"
  fi
}
 
# parse HTTP request. GET should always be the first line. 
read request
url="${request#GET }"
url="${url% HTTP/*}"
handle_file "${DOCUMENT_ROOT}${url}"

You can also download the source from http://stromberg.org/svn/repos/intarweb/intarweb.sh. Credit goes to A web server in a shell script for the expansion and MIME type shortcuts.

Tags: technology

Viewing 1 Comment

 

Trackbacks

(Trackback URL)

close Reblog this comment
blog comments powered by Disqus