This draft document defines the format and handling of mailto URIs.
Last updated: November 13, 2009
The rules in this document are not restricted by the rules in RFC2368 or draft-duerst-mailto-bis.
Open issue - Need to split things up into strict authoring requirements and lax parsing/handling requirements. This is partially done on this page for now, which contains some updated info and is easier to follow. This document still contains info the other does not though.
Mailto URIs are used to specify email message compose data in a portable format. Mail clients parse and decode this data to generate default text field values for compose forms, which users can review before sending.
"mailto:?" + "&-separated list of hname=hvalue pairs"
All hname=hvalue pairs are optional.
"mailto:" and the TO, CC, BCC, SUBJECT and BODY hnames are handled in a case-insensitive manner. In fact, all hnames are handled in a case-insensitive manner.
In HTML and XML markup, "&" must be represented as "&".
In short, mailto URIs just carry a bunch of utf-8-percent-encoded field values that mail clients percent-decode. That's all mailto URIs are.
TO, SUBJECT, BODY, CC and BCC are the basic hnames that should be supported.
Everything after "mailto:" on up to (but not including) "?" (or the end of the URI if a "?" is not present) also represents a TO hvalue (where "mailto:" represents the TO hname for that value).
"Comma-separated" refers to separating by a comma and a space. For example: "1, 2, 3, 4, 5" and NOT "1,2,3,4,5". In the case of an hvalue, each comma is percent-encoded as %2C and each space is percent-encoded as %20.
"\r\n-separated" refers to separating by a carriage return (0x0D) and a line feed (0x0A). In the case of an hvalue, each carriage return is percent-encoded as %0D and each line feed is percent-encoded as %0A. For example, "line1%0D%0Aline2".
BCC addresses are meant to be somewhat private. Having a BCC address (in a mailto URI or in raw format) in public content reduces that privacy. Despite the privacy concern with public content, mail clients should still accept BCC hvalues from a mailto URI.
(See RFC2822 - 3.4. Address Specification for specifics.)
Here are some example addresses. They are shown in unencoded form. If these values are put in a mailto URI, they need to be utf-8-percent-encoded first.
Again, addresses don't go in mailto URIs. Only utf-8-percent-encoded values representing addresses do. The mail client won't see any addresses until it parses and utf-8-percent-decodes the mailto URI.
"string" in the following encoding, decoding and parsing steps refers to a sequence of 8-bit unsigned chararacters representing groups of valid utf-8 sequences. If working with a string with a different character width, adjust the steps accordingly.
Hvalues and hnames are strings where reserved (at least) characters are utf-8-percent-encoded.
While I is less than the length of S:
If C equals a Line Feed (0x0A):
Else if C is found in NOENCODE:
Else:
All characters in an hvalue or hname that are not in the NOENCODE group need to be utf-8-percent-encoded to their corresponding %HH. "@" for example still must be utf-8-percent-encoded to %40 like it is in HTTP query string values. It is not a separator in a mailto URI and is only a separator in a raw address value, which mailto URIs do not contain. Same type of thing goes for "+". It is not exempt from being utf-8-percent-encoded to "%2B". (Although mail clients still need to treat "@" as "@" and "+" as "+" should they occur in raw form in the URI.)
Just like hvalues, hnames are made up of utf-8-percent-encoded sequences. This allows clients to support unicode hnames if needed. For example, mailto:?%E2%88%9A=%E2%88%9A is valid in a mailto URI.
In short, hnames and hvalues are just utf-8-percent-encoded versions of some unencoded values. You can think of it (in Javascript terms) as: encodeURIComponent(some_value) + '=' + encodeURIComponent(some_value) with the addition that all \r\n, stray \r and \n are represented as %0D%0A.
If your encoding (or decoding) methods are strict and throw exceptions, on invalid utf-8 sequences for example, whether you catch those exceptions and return an empty string, the original string or something else, is up to you. Returning an empty string on an error is often desirable, but you may want to be more relaxed or more strict.
Subject fields in many mail clients don't support newlines. Clients may strip the newlines out, but they might not and might break something. If unsure, you can strip all %0D%0A from the subject hvalue. Or, you can convert each %0D%0A to a %20. Or, you can convert each %0D to %20 and each %0A to %20 so that you can have *some* clue as to where the newlines were at by looking for 2 consecutive spaces in the subject field.
Note that <input type="url"> in Web Forms 2.0 requires an IRI. In this case, in addition to the normal reserved characters that need to be encoded, instances of "(", ")", "!", "*", and "'" may also need to be utf-8-percent-encoded to their corresponding %HH so that the input will consider the mailto URI an IRI. Otherwise, the input will consider the value invalid (unless the UA allows the invalid input and fixes it for you). See RFC3987 for details.
A raw "#" should never appear in a mailto URI. It should always be represented in a mailto URI as %23. If a raw "#" happens to be in a mailto URI, some mail clients (or browsers passing a mailto URI to a mail client) might treat it as a fragment identifier, where the "#" and everything after it is ignored. Other clients may treat it as just another character. Making sure "#" is always encoded as %23 avoids this issue. However, if a raw "#" is present in a mailto URI, the suggestion is to do what most clients do, and that's to treat the "#" as just another unreserved character. For example, mailto:#test would result in "#test" being in the TO field. mailto:?subject=#test would result in "#test" being in the subject field. In short, there's no such thing as a fragment identifier in a mailto URI, according to most clients.
If, for whatever reason, you want to compose a URI that puts just numbers in the TO field, you have to be careful. For example, if you wanted 44 to show up in the TO field, you might use "mailto:44". However, this creates a problem if you enter the URI in a browser's address field. The browser's address field might think that you mean to go to some mailto site on port 44. To avoid this, use "mailto:?to=44" instead. Or, utf-8-percent-encode the first 4 to %34. This should not be a problem for links in a web page.
While I is less than the length of S:
If C equals "%" and I + 2 is less than the length of s:
If F1 or F2 was not found in HEXITS:
Else:
Else:
RET will be a string of raw utf-8 sequences with all newline pairs and single newlines normalized to \n.
In a mailto URI, there can be more than one hname with the same name. When a client parses a mailto URI to generate a TO, CC, BCC, SUBJECT or BODY value, the following rules should be followed.
The decoded version of the generated TO value is what browsers should use for the mailto link "Copy email address" feature. "Copy subject", "Copy body", "Copy CC addresses" and "Copy BCC addresses" features should use the decoded version of their corresponding generated value.
If the client supports other address-based hnames, the TO, CC and BCC rules can be used if desired. If the client supports other multiline-based hnames, the body rule can be used. If the client supports other single-line hnames, the subject rule can be used.
Note that array-based hnames like ?body[]=line1&body[]=line2 are not supported in mailto URIs.
For TO, CC and BCC (and generally other address types), there is no need to join empty hvalues. You'll end up getting ", , , , ," if you do.
For SUBJECT, since there can be only one subject in a message, make each new SUBJECT hvalue override the previous one. Also, there are existing implementations that already do this.
For BODY, do not join empty BODY hvalues until there is a BODY hvalue with significant content (non-empty). That way, empty body hvalues do not cause a bunch of empty lines at the top of the BODY field before there's significant content.
If you want your mailto URI to be handled correctly in clients that don't support duplicate hnames, don't use duplicate hnames in your URI. Currently, there's not much need to anyway.
Open issue - Need to fix the pre-parsing a little to convert "mailto:&&&?" to "mailto:%26%26%26?" so that splitting by '&' comes out better.
Before parsing, the URI needs to be stripped of "mailto:" and converted into a dataset of &-separated hname=hvalue pairs so that they can be split by "&".
IF S starts with a case-insensitive "mailto:":
IF one or more '?' are present in SUB:
Else:
Since "mailto:" is considered a TO hname, step #3 will ensure that there's always a TO hvalue in the returned dataset even if the hvalue is empty. If you do not wish this to happen (because you're going to split the datset and store the values in a multimap of some sort for example and want to skip the first TO entry if it's empty), you can use the following steps for #3 instead.
IF S equals a case-insensitive "mailto:":
Else IF S starts with a case-insenstive "mailto:?":
Else:
While I is less than the length of HLIST:
If a '=' is found in S:
IF HNAME is empty:
IF LCCHECK equals "to" or "cc" or "bcc" or "subject" or "body":
If HNAME equals "to":
If HVALUE is not empty:
If TO is not empty:
Else if HNAME equals "cc":
If HVALUE is not empty:
If CC is not empty:
Else if HNAME equals "bcc":
If HVALUE is not empty:
If BCC is not empty:
Else if HNAME equals "subject":
Else if HNAME equals "body":
If (HVALUE is empty and BODY is empty) equals false:
If BODY is not empty:
Else if HNAME equals "another hname you want to support":
Handle multiple instances of HNAME as desired. However, if it's an address type, it is recommended that you parse it like TO, CC and BCC and handle HNAME in a case-insensitive manner.
The decoded values are what mail clients use (after converting them to the needed encoding) to fill in a compose form's text fields.
Note that for step #2, splitting the string into an array is just one example. You can parse the string directly if desired.
If your app and or your parser does not support null bytes, before parsing the mailto URI, convert "%00" (and raw null bytes) to "%2500" so that when decoding, they show up in the compose field as "%00". Use this same type of normalization for other bytes that are not supported and invalid %HH. For example, with "mailto:?subject=%YY", %YY is not a valid %HH. It's really just a % (that should have been utf-8-percent-encoded to %25) and 2 Y's. %YY should be treated as %25YY so that when it's decoded, it comes out as %YY in the subject field. This is necessary for strict decoders.
Click on the link. In your mail client's compose form, you should get the following results.
To:
Cc:
Bcc:
Subject:
Note that there may be extra lines in the body in your mail client if it has signature support.
Also note that for the TO, CC and BCC fields, user-input in most clients requires properly-escaped addresses where certain charactes are escaped with "\" and other parts are quoted (see Address Syntax). However, some clients allow unescaped user-input and will, after decoding an hvalue, deescape characters escaped with "\" when filling in the fields. As long as the intended headers are produced in the outgoing message, the client can use whatever user-input method that's desired.
Note that this test currently does not test the matching of an hname if it's encoded. For example, "mailto:?%74%6F=email%40site.com" is the same as "mailto:?to=email%40site.com". This is covered in the parsing section though where the hname is percent-decoded first before lowercasing and checking for a match. Will eventually update the test to cover this though.
Note that the implementations below may not always be aligned with the spec (and as of right now, they're not. The C++ version is closest though). Consider them just helpful examples for now until things can be set up better to stay in line with the spec.
In MailtoURIParserPack.zip are C, C++, D, Java, Javascript, Perl, Python, Ruby, Pike, Lua, Tcl, PHP5 and Python3000 MailtoURIParser classes that use the rules in this document to show examples of parsing. The test above uses a Javascript version to parse the mailto link to generate the form data.
Mozilla Thunderbird 3.0 generally parses mailto URIs according to the rules in this document (keeping the deescaping note for the TO, CC and BCC fields in mind). Thunderbird 3.0 passes the test above.
Opera 9.5 generally parses mailto URIs for its built-in mail client according to the rules in this document. It passes the test above.
An Opera UserJS mailto link handler that opens mailto links in various webmails.
Javascript mailto_uri_parser object
The action attribute value of an HTML form can be any valid mailto URI including just "mailto:". When the form is submitted, how the URI is generated before being passed to the mail client depends on the form submission method.
Mailto URIs are NOT of type application/x-www-form-urlencoded and mail clients don't decode + to a space. Spaces in the generated mailto URI must be represented as %20 and NOT +.
Further, for '+' chars in the action attribute, when the UA sumits to an HTTP-based mailto handler, the UA must first convert them to %2B to avoid HTTP clients percent-decoding them to a space. Authors should also use %2B instead of '+' in the action attribute to avoid these problems just in case the UA doesn't do the conversion.
When submitting to the mail client with method="get", "mailto:?" plus the encoded data set for the form should be generated. Existing hvalues in the mailto URI will be ignored.
Open issue - Actually, for GET, browsers will only ignore hnames after '?'. With action="mailto:email%40example.com", browsers will keep the email%40example.com part, append '?' and then the dataset. Since "mailto:email%40example.com" is equivalent to "mailto:?to=email%40example.com", browsers *technically* shouldn't keep the To hvalue. However, that would require browsers to have specific appending behavior just of mailto URIs. So, this GET section will probably change to match browsers in this aspect.
For example, the following form would generate and submit "mailto:?to=to1%40example.com%2C%20to2%40example.com&to=to3%40example.com%2C%20to4%40example.com&cc=cc1%40example.com%2C%20cc2%40example.com&cc=cc3%40example.com%2C%20cc4%40example.com&bcc=bcc1%40example.com%2C%20bcc2%40example.com&bcc=bcc3%40example.com%2C%20bcc4%40example.com&subject=subject%201&subject=subject%202&body=Line%201%0D%0ALine%202&body=Line%203%0D%0ALine%204" to the mail client.
<form action="mailto:?body=This%20hvalue%20will%20be%20removed." method="get">
<p>
<input type="text" name="to" value="to1@example.com, to2@example.com">
<input type="text" name="to" value="to3@example.com, to4@example.com">
<input type="text" name="cc" value="cc1@example.com, cc2@example.com">
<input type="text" name="cc" value="cc3@example.com, cc4@example.com">
<input type="text" name="bcc" value="bcc1@example.com, bcc2@example.com">
<input type="text" name="bcc" value="bcc3@example.com, bcc4@example.com">
<input type="text" name="subject" value="subject 1">
<input type="text" name="subject" value="subject 2">
<textarea name="body">Line 1
Line 2</textarea>
<textarea name="body">Line 3
Line 4</textarea>
<input type="submit" value="Compose">
</p>
</form>
So, for method="get", action="mailto:" is what you should use.
When submitting to the mail client with method="post", the encoded data set is itself encoded and used as the BODY hvalue of the generated mailto URI. A Subject hvalue is also generated that contains the User Agent string, which may be overridden by the last Subject hvalue in the URI if one is present. Also, since this is POST, if there are any existing hvalues in the mailto URI, they are kept.
For example, in the following form, "mailto:?body=This%20hvalue%20will%20NOT%20be%20removed.&subject=Form%20Post%20from%20Opera%2F9.50%20(Windows%20NT%205.1%3B%20U%3B%20en)&body=body%3DJust%2520a%2520test." will be generated.
<form action="mailto:?body=This%20hvalue%20will%20NOT%20be%20removed." method="post">
<p>
<input type="text" name="body" value="Just a test.">
<input type="submit" value="Compose">
</p>
</form>
Also see JS parsing examples and createMailtoURIFromEnabledFormControls() in BeforeMailtoURL.js for details.
action="mailto:" is partially broken (at different levels) in Opera, Safari, Firefox and IE. Firefox has the best support though with its only problem being that it encodes spaces as +.
Open issue - Need to describe exact resolving methods based on what browsers do and on what's safe.
When browsers resolve mailto URIs during markup parsing or input submission, raw white-space (leading and trailing stripped when parsing), non-ascii characters and percent-encodings not based on utf-8 (even if the URI is represented with a non-ascii string and or the document encoding is not utf-8) should be utf-8-percent-encoded to %HH. For example, after entering "mailto:√?subject=1 2√" into a browser's address field and pressing enter, it should be resolved to "mailto:%E2%88%9A?1%202%E2%88%9A". It is very important that this happens before passing a mailto URI on the command line. The presence of raw quotes, backslashes and wide characters can split up commands and execute arbitrary programs (even if the URI is quoted before passing). This is also very important for the mailto handler as it might only be able to percent-decode utf-8-based percent-encoded strings. Also, if the URI looks like an IRI, it should be resolved to a URI.
Also, when browsers resolve, if a mailto URI contains a %HH sequence representing a character that is not reserved, the %HH sequence should be left alone (at least in places like the address field where a user might be manually editing the URI, or for links when the user might "Copy link address"). It should not be decoded just because it doesn't need to be encoded. For example, "mailto:%2E?subject=%2E" should not be resolved to "mailto:.?subject=.".
Also, a browser's "Copy link address" option should copy the mailto link to the clipboard in it's fully-encoded, resolved state.
Also, by default, browsers, should display mailto links in status bars and tooltips in their fully-encoded, resolved state. mailto URIs can lose their meaning when you show them in decoded form and it's no help to the user. The user will get to see everything in decoded format during the compose preview. However, if there's a way for the user to distinguish between a separator and decoded text that looks like a separator (by changing the color of the real separator for example), showing the URI in decoded form may be O.K.
A browser's "Send link by mail" feature should generally work like this Javascript example:
Open issue - Thunderbird's different command line methods for supporting Send Link By mail are slightly more complicated than below. Need to update the Thunderbird examples.
function UTF8PercentEncodeWithNormalizedNewlines(s) {
try {
// Normalize raw newlines first so that *if* there are any newlines
// in s, \r\n, stray \r and \n all come out as %0D%0A.
return encodeURIComponent(s.replace(/\r\n|\r|\n/g, "\r\n"));
} catch (e) {
return "Error%20encoding%20data.";
}
}
function UTF8PercentEncodeWithNewlinesStripped(s) {
try {
return encodeURIComponent(s.replace(/\r|\n/g, ""));
} catch (e) {
return "Error%20encoding%20data.";
}
}
function sendLinkByMail() {
var subject = UTF8PercentEncodeWithNewlinesStripped(document.title);
var body = UTF8PercentEncodeWithNormalizedNewlines("<" + document.location + ">");
var uri = "mailto:?subject=" + subject + "&body=" + body;
window.open(uri);
}
// For Thunderbird 2 (not 3) with HTML composition turned on for example
function sendLinkByMailBodyIsHTML() {
var subject = UTF8PercentEncodeWithNewlinesStripped(document.title);
var body = UTF8PercentEncodeWithNormalizedNewlines("<" + document.location + ">");
var uri = "mailto:?subject=" + subject + "&body=" + body;
window.open(uri);
}
// For Thunderbird 2 (not 3) with HTML composition turned on for example
function sendLinkByMailBodyIsHTMLActualLink() {
var subject = UTF8PercentEncodeWithNewlinesStripped(document.title);
var link = '<a href="' + document.location + '">' + document.location + '</a>';
var body = UTF8PercentEncodeWithNormalizedNewlines(link);
var uri = "mailto:?subject=" + subject + "&body=" + body;
window.open(uri);
}