I was just reading a message in the XKCD forums, and noticed there were all these comments about how the "#!" was 'interpreted by the shell'. Given that XKCD readers are pretty technical, this surprised me.
Actually, the hashbang is interpreted by the kernel. hashbang was invented in Version 7 UNIX, as a result of the split between East Coast UNIX and West Coast UNIX.
In Version 6 UNIX there was only one shell. It was a pretty horrible shell, too. It didn't have control structures. It didn't have comments. It didn't have file name expansion or wildcards - they were sort of transparently handled by having the shell run a separate program called "glob". You couldn't pipe into shell scripts because the shell interpreted scripts by reading them from standard input. There was no buffering, so it was possible to control the flow of a shell script by seeing around on standard input, looking for a line starting with ":" to indicate a label. ":" wasn't a comment marker, mind, it was a command that didn't do anything, like /bin/true. It may have even been a link to /bin/true.
On the east coast, this fellow at AT&T named Steve Bourne was coming up with a new shell that would end up in Version 7 UNIX.
On the west coast, there were two competing improved shells, the pascal shell (which never took off) and the C shell.
The C shell had control structures and comments and all kinds of cool stuff. The comment character was "#". So when you ran a shell script it forked, opened the file, and looked at the first character, and if it was "#" it execed a new instance of the C shell to interpret it, otherwise it just execed the old /bin/sh with stdin set to the file. OK, that's a lot of jargon, let's just say that the first character was used by the C shell to determine what shell to use to run the script.
Now I'm going to digress into more jargon. When the kernel ran a program, it used the first two bytes of the program to determine how to run it. Originally it just loaded the program and jumped into location zero, but by the time of V6 and V7 there were a number of different things that had to be done to set up the program, so it opened up the file and read the first two bytes. If the first two bytes were a jump to one particular location, it knew it was one kind of program, if it was a jump to another location, it was another kind of program. So this jump instruction became a "magic number" that the kernel used to decide how to run the program. Other magic numbers got created, and they didn't have to be PDP-11 jump instructions, (or even run on a PDP-11) but one number was as good as any other so those just happened to be kept.
Anyway, by this time in UNIX the first 16 bits of the program was a magic number... the kernel read it and used that to decide how to run a program. I could go into more details like "pure text" versus "split I&D", but it doesn't matter... the point is, the feature was there.
So... in Version 7, they defined a new 16-bit magic number whose value looked like "#!" and said "this magic number is interpreted by reading a line of text, breaking it up into two words separated by a space, and calling an interpreter whose name was in the first word, and passing the second word and the name of the file to that interpreter".
This was the hashbang born. Like many great ideas it was simple, brilliant, and extraordinarily useful.