| Introduction to E-Mail Technology |
|
You depend on e-mail. You couldn't get your work done without it. Yet most users (not to mention IT professionals and managers) experience e-mail as a mysterious, magical function. You write a message on your computer, you click Send, and moments later, it appears in the recipient's inbox. Poof! E-mail happens invisibly. No creaking and groaning of IT infrastructure reminds you that e-mail delivery is actually a complex system with a lot of moving parts. Overall, that's a great success story—how many long-term IT services work so smoothly that users take them for granted? But if you have any responsibility for ensuring that the mail arrives, or for managing the hardworking e-mail administrators who do, it behooves you to know a minimum of the technology basics. This article centres on the technology of e-mail. It doesn't go into e-mail management, corporate policies or matters that involve human behaviour. (That subject is covered in a later article.) Nor does this article address the key issues to consider in the war against spam, though spam fighting represents a huge amount of an e-mail administrator's energy (and angst) these days; another article addresses what managers should know about spam fighting. Don't expect technical depth: This is, after all, an ABC, not the entire alphabet. Managers should, however, understand that a full conceptual explanation could easily fill 40 pages with dense technical definition; most of it is far more than I want to know, too. If e-mail is important for your business, however, you should have skilled people around who are up to the challenge. This article covers the underlying technology (or, if you prefer, the most essential of those magic spells), so you have some idea of how the process works, and thus what can go wrong.
How does e-mail get from the sender to the receiver?A. Perhaps the first fundamental is that e-mail isn't handled by one kind of server or technology. It's a suite of protocols that are served by distinct processes. We'll look at those in a little more detail after the overview. Let's say you've written a brilliant message in your e-mail client—the software application you use on your desktop to compose and organize messages, such as Microsoft Outlook, Apple Mail or Thunderbird. E-mail professionals call that client application the mail user agent (MUA). The MUA may not be a desktop application; it may be a "Web mail" application that runs on a Web server and which you control using your browser. Web mail clients, whether through Gmail, Yahoo or a corporate front end to another system (say, to Lotus Notes), are treated the same way as desktop client MUAs by the rest of the e-mail transport process. When you click on the Send button, the message disappears from your screen... and sets an entire chain of events in motion. After you click Send, the message is transferred to your outgoing mail server, which is probably named something like mail.yourcompany.com. The mail server—formally called a mail transfer agent (MTA)—knows to accept the message, either because you are in a network it trusts, or because you provided a username and password (generally stored in the MUA's configuration files). This network process is accomplished using the Simple Mail Transfer Protocol (SMTP), and the "make sure the sender is trustworthy" process is called authenticated SMTP. With your brilliant message in hand (or in queue), your mail server needs to send it along. The mail server contacts the recipient's mail server and transfers the mail, again using SMTP. But of the millions of mail servers, which does it contact? Your mail server does a lookup on the domain name servers (DNS), which are a kind of library card catalog for the Internet, to find out who's signed up to accept mail for the recipient's domain. The DNS gives your mail server the mail exchange (MX) records (there can be more than one) that are registered for that domain. That gives your mail server the server to contact, and it can start on its "Hey, I've got mail for you" conversation. The message is sent over the Internet, via TCP/IP (Transmission Control Protocol/Internet Protocol). Don't generalize and say "over the Web," here; while you can occasionally use the terms interchangeably, this isn't one of those times. Hearing you say this will make your techies wince. The server-to-server communication process is somewhat different than it is when the server is talking to the client MUA, although both use SMTP. One difference is trust; between hard-coded programming and administrator settings, every mail server through which a message passes—and there may be several—has to assume that the message is wrongly formatted (like the post office refusing a letter because it lacks a full street address) or, sadly more likely, because it breaks the rules in the pursuit of sending spam or viruses. Primarily because of spam, most mail servers put each message through a multistep process before they will even accept the data, much less store it and forward to the user. Those steps are covered a little more below, and in some detail in Getting Clueful: Five Things You Should Know About Fighting Spam; for this broad overview, just be aware that messages can be lost or rejected for many reasons, not all of which are intended to cause you personal grief.
Once the message arrives at the destination mail server, that is, the server responsible for delivering to the recipient (such as mail.yourcustomer.com), it's ready to distribute to the individual who is, presumably, anxiously waiting for a word from you. Here, too, there are choices for the mail administrator, particularly in how mail should be stored and forwarded to end users. Every organization (or its e-mail admins) decides which method best serves its needs. Most likely, the primary protocol used in your shop is the Internet Message Access Protocol (IMAP), which keeps all messages on the incoming mail server, neatly sorted into user folders. It's far more rare, nowadays, for companies to use the Post Office Protocol (POP3). Using POP3 e-mail, the "Get new mail" command in your MUA causes the application to download all messages to the local computer. Under most circumstances, the POP3 e-mail messages are then deleted on the mail server. The recipient presses "Get new mail" on her own MUA... and there is that brilliant message you wrote. Magic! By the end of the process, your e-mail message may travel around the world through five or six separate computers. But in many cases, your brilliant message arrives on the recipient's desktop in a minute or two. Is that cool, or what? All that happens quickly when the system works. But what happens when it doesn't?
How can e-mail be delayed or lost?Early hype described the Internet as an "electronic superhighway," a phrase that grew dated faster than a 1970s olive-green polyester leisure suit. In this case, however, a network of highways and side roads is a useful analogy. If you encounter no traffic on the way to work, you can get to the office in, say, 20 minutes. But inevitably, the road is clogged with other cars, causing you to wait 5 minutes just to turn left at one intersection. Construction can make it impossible to take the usual route. Your car may break down. The trip takes a lot longer than 20 minutes. The same things apply to e-mail traffic. Mail servers are fast, but messages can queue up under a heavy load. Internet traffic can require messages to be rerouted through paths that aren't obvious. Servers can lose connectivity with the Internet. Users can unplug network cables, "helpfully" change MUA settings (what were they thinking?), and decide blithely to send a 10MB PowerPoint file to 35 of their closest friends (and then demand to know why the message didn't arrive in nanoseconds). It isn't common for your mail server to hand off your brilliant message to the recipient's mail server with just one "hop." Like the postal service, messages may pass from one place to another before they are delivered. Messages are handed from machine to machine in a "store and forward" model that may involve many computers, so the overall speed of delivery is highly variable. This also means that messages travel through computers that are not visible to or even known to the sender or recipient. The store and forward model is critical to the robustness of e-mail, because it permits secondary routes for mail to get from one place to another, and for technical practices that cope with failures by taking alternate paths or retrying to create a connection when a problem is encountered. And that's without reference to the "nasties" such as spam and viruses, the road rage of Internet traffic. In addition to consuming a vast amount of bandwidth (I could quote percentages, but any numbers I cite would be higher by the time you read this), spam, viruses and Trojan horses cause network admins to invest a lot of time and effort in building traps to prevent the bad stuff from reaching users' inboxes. Every gateway takes time, like a highway tollbooth that slows traffic. Historically, mail servers were very forgiving of technical carelessness. But in the modern world, mail deliverability can be harmed by minor technical hygiene issues like inaccurate domain name servers (DNS), ill-considered tuning of timeout parameters and unusual mail formatting. And then there's the burden placed on everyone for dealing with spam, phishing and viruses. There's no such thing as a perfect spam filter. They're getting better, sure, but you've probably encountered at least one situation in which a real e-mail message was stuffed wrongly into a spam folder. Another barrier comes from misconfigured e-mail clients and servers (such as your own!) failing to follow the rules; more and more commonly, their mail is rejected by the recipient's mail server (which is not always kind enough to tell you). If that happens, the message is delayed or lost. This means that companies must enforce standards-based e-mail technology (such as ensuring that their servers adhere to the RFCs), and that users must be taught proper e-mail behavior (such as sending a message from the same server from which their e-mail ID originates).
What's the difference between all these protocols, like IMAP and POP, and why should I care?It's time to get a little more techie. As mentioned earlier, e-mail can use a lot of Internet protocols—protocols being industry standard methods of transmitting data—and it's helpful to be familiar with these, at least at a high level. At minimum, you have protocols used by inbound servers—what I've been calling "the recipient's server," generally POP3 and IMAP. Outbound servers, the ones that are mailing messages elsewhere, use SMTP. A company may also have a separate authentication server (LDAP) and perhaps other pieces providing calendaring (often involving SQL databases), Web mail (uses Web browsers, for which the relevant protocols are HTTP and IMAP), and central storage of client configuration (ACAP). The different protocols exist not because some programmer thought it would be cool to create one, but because each protocol serves a completely different need. For example, POP3 was designed to support lightweight, disconnected clients. IMAP provides server-based storage of mail folders. LDAP provides authentication not just for mail systems but for many other applications. And so on. Each solves a particular problem. Most of these protocols are just, well, how the pieces fit together and no decisions are necessary or possible. One of the few instances where your company has an active choice is IMAP versus POP3. IMAP is more popular than POP3 nowadays, though both have their adherents. So let's take a quick look at their advantages and disadvantages. IMAP has gained popularity because the mail stays on the server. Most mail clients (MUAs) permit users to sync the data to a local hard disk—a necessity for mobility, such as the omnipresent plane trips—but the messages' primary home is on the server. IMAP makes administration far easier for IT managers since there's only one computer to back up and it's easier to control how much disk space is consumed by limiting mailbox size. Users appreciate the ability to access their e-mail from any computer using whichever MUA is convenient, and because IMAP can store message state (such as whether an individual message was read or replied to) and keep Sent messages. Companies also have more access to the contents of the e-mail, which can be important for regulatory and compliance reasons (such as archiving mail, a topic we won't address here) but also irritating to some users. IMAP (particularly SSL IMAP, which adds security features) can also enable more efficient bandwidth use. Instead of downloading the messages to the user's inbox, by default IMAP sends message headers (sender, recipient, subject line, etc.). Only the messages selected are sent to the inbox, and clients may retrieve the text portion without retrieving attached files. However, IMAP has its downsides. If a corporation keeps all the mail on one server, and doesn't back it up (and test those backups) there's a single point of failure. Plus, e-mail messages can be huge, particularly with attachments or embedded images; many companies cope with this by creating rules about disk space (such as "maximum of 100MB"), which irk users who really do need more (or at least who believe they do). And, of course, IMAP e-mail isn't accessible without an active Internet connection or syncing with a local computer. Many of the pro and con arguments for POP e-mail are the flip side of IMAPs. Because the messages are downloaded to an individual computer, the message box size is limited only by the users' available hard disk space, and messages are available anytime—but they're available only on that one computer, and if the disk crashes... oops. It also gives the user an illusion of privacy, though while messages are stored on the server (until sent to the MUA) the company does have access to them. In any case, POP3 is widely used for dial-up connections (which, yes, do still exist) and it works with older e-mail clients (to which some users cling).
How does spam filtering work?It's one thing to say, "Get rid of spam, but don't lose any real mail." It's another to accomplish that goal. Problems can ensue from messages falsely declared to be spam, from messages falsely declared not to be spam, and from the annoyances that some spam-fighting methods can create for communication among business correspondents. In other words: It isn't perfect. It's necessary, much to everyone's dismay, but the technology is still a work in progress. E-mail can be filtered at any point in the message-passing process. It's unlikely to happen on the outgoing side (presumably because spammers are aware of what they're doing). Inbound e-mail can be examined on the server (should the company decide to do so, and most enterprises do), and on the client (MUA). At the e-mail server level, messages may be examined by appliances or dedicated software (which include antivirus tools), or with features built into the e-mail server itself (though some require customization or add-on utilities). Server antispam methods are wide-ranging. A small sample of the methods used include:
When server filters work, it's less necessary to install a client-side filter. But not every company installs server-side filtering, or they do a less-than-conscientious job at maintaining the software (keeping up with it can be a full-time job). Subscribers to commercial ISPs have even less control. Fortunately, most e-mail client applications, both Web mail and desktop-based, include some kind of spam filtering, and you can purchase add-ons to sift through messages and sort the probable-unsavory into a specialized "unsure" folder or otherwise mark them for careful examination. (Doing so, however, requires that you actually examine the messages.) Here are a few of the methods used:
There's a lot more to e-mail than I've discussed in this brief overview. Among the unanswered topics are e-mail archiving, administering e-mail lists and encryption of e-mail messages. The above, however, should get you well on the way to understanding how the system works. |