Modern Mailto URI Scheme

Abstract

This draft document defines the format and handling of mailto URIs.

RFC2368 does not apply to this document.

Table of Contents

Introduction

Mailto URIs are used to specify email message compose data in a portable format. Mail clients parse and decode this data to generate default text field values for compose forms, which users can review before sending.

Syntax

"mailto:?" + "&-separated list of hname=hvalue pairs"

All hname=hvalue pairs are optional.

"mailto:" and the TO, CC, BCC, SUBJECT and BODY hnames are handled in a case-insensitive manner. Other hnames are generally handled in a case-sensitive manner, but it's up to the handler of the URI.

In HTML and XML markup, "&" must be represented as "&".

In short, mailto URIs just carry a bunch of percent-encoded field values that mail clients percent-decode. That's all mailto URIs are.

The basic hnames

TO, SUBJECT, BODY, CC and BCC are the basic hnames that should be supported.

Everything after "mailto:" on up to (but not including) "?" (or the end of the URI if a "?" is not present) also represents a TO hvalue (where "mailto:" represents the TO hname for the value).

Definitions for the basic hvalues

TO
A properly-escaped, encoded, comma-separated list of addresses to send a copy of the message to (with the knowledge of other recipients), where the recipients are expected to take part in the discussion.
CC
A properly-escaped, encoded, comma-separated list of addresses to send a copy of the message to (with the knowledge of other recipients), where the recipients are at least expected to read the discussion.
BCC
A properly-escaped, encoded, comma-separated list of addresses to send a copy of the message to (without the knowledge of other recipients), where the recipients are at least expected to read the discussion.
SUBJECT
The subject of the discussion, encoded.
BODY
An encoded, \r\n-separated list of lines representing the body content of the message.

"Comma-separated" refers to separating by a comma and a space. For example: "1, 2, 3, 4, 5" and NOT "1,2,3,4,5". In the case of an hvalue, each comma is encoded as %2C and each space is encoded as %20.

"\r\n-separated" refers to separating by a carriage return (0x0D) and a line feed (0x0A). In the case of an hvalue, each carriage return is encoded as %0D and each line feed is encoded as %0A. For example, "line1%0D%0Aline2".

BCC addresses are meant to be somewhat private. Having a BCC address (in a mailto URI or in raw format) in public content reduces that privacy. Despite the privacy concern with public content, mail clients should still accept BCC hvalues from a mailto URI.

Address syntax

Address examples

(See RFC2822 - 3.4. Address Specification for specifics.)

Here are some example addresses. They are shown in unencoded form. If these values are put in a mailto URI, they need to be encoded first.

Again, addresses don't go in mailto URIs. Only percent-encoded values representing addresses do. The mail client won't see any addresses until it parses and decodes the mailto URI.

URI Examples

Note about "string"

"string" in the following encoding, decoding and parsing steps refers to a sequence of 8bit unsigned characters. If working with a string of characters with a different type and or width, adjust the steps to conform.

Encoding data

Hvalues and hnames need to be utf-8-percent-encoded.

  1. Let S be a string of raw utf-8 sequences.
  2. Replace all Carriage Return + Line Feed pairs (0x0D + 0x0A) in S with Line Feeds (0x0A).
  3. Replace all Carriage Returns (0x0D) in S with Line Feeds (0x0A).
  4. Let NOENCODE be the string "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.!~*'()".
  5. Let HEXITS be the string "0123456789ABCDEF".
  6. Let I be the current position in S starting at 0.
  7. Let RET be the empty string to hold the built value.
  8. While I is less than the length of S:

    1. Let C be the character in S at position I.
    2. If C equals a Line Feed (0x0A):

      1. Append "%0D%0A" to RET.
    3. Else if C is found in NOENCODE:

      1. Append C to RET.
    4. Else:

      1. Append "%" to RET.
      2. Let CI be the 8bit unsigned value (0 to 255) that represents C.
      3. Let F1 be the value of CI >> 4.
      4. Let F2 be the value of CI & 0x0F.
      5. Let A be the character at position F1 in HEXITS.
      6. Let B be the character at position F2 in HEXITS.
      7. Append A to RET.
      8. Append B to RET.
    5. Increment I by 1.

All characters in an hvalue or hname that are not in the NOENCODE group need to be encoded to their correpsonding %HH. "@" for example still must be encoded to %40 like it is in HTTP query string values. It is not a separator in a mailto URI and is only a separator in a raw address value, which mailto URIs don't contain. Same type of thing goes for "+". It is not exempt from being encoded to "%2B". (Although mail clients still need to treat "@" as "@" and "+" as "+" should they occur in raw form in the URI.)

Just like hvalues, hnames can contain utf-8 percent-encoded sequences. This allows clients to support unicode hnames if needed. For example, mailto:?%E2%88%9A=%E2%88%9A is valid in a mailto URI.

In short, hnames and hvalues are just encoded versions of some unencoded values. You can think of it (in Javascript terms) as: encodeURIComponent(some_value) + '=' + encodeURIComponent(some_value) with the addition that all \r\n, stray \r and \n are represented as %0D%0A.

If your encoding (or decoding) methods are strict and throw exceptions on invalid utf-8 sequences for example, whether you catch those exceptions and return an empty string, the original string or something else, is up to you. Returning an empty string on an error is often desirable, but you may want to be more relaxed or more strict.

Subject fields in many mail clients don't support newlines. Clients may strip the newlines out, but they might not and might break something. If unsure, you can strip all %0D%0A from the subject hvalue. Or, you can convert each %0D%0A to a %20. Or, you can convert each %0D to %20 and each %0A to %20 so that you can have *some* clue as to where the newlines were at by looking for 2 consecutive spaces in the subject field.

Note that <input type="url"> in Web Forms 2.0 requires an IRI. In this case, in addition to the normal reserved characters that need to be encoded, instances of "(", ")", "!", "*", and "'" may also need to be encoded to their corresponding %HH so that the input will consider the mailto URI an IRI. Otherwise, the input will consider the value invalid (unless the UA allows the invalid input and fixes it for you). See RFC3987 for details.

A raw "#" should never appear in a mailto URI. It should always be represented in a mailto URI as %23. If a raw "#" happens to be in a mailto URI, some mail clients (or browsers passing a mailto URI to a mail client) might treat it as a fragment identifier, where the "#" and everything after it is ignored. Other clients may treat it as just another character. Making sure "#" is always encoded as %23 avoids this issue. However, if a raw "#" is present in a mailto URI, the suggestion is to do what most clients do, and that's to treat the "#" as just another char. For example, mailto:#test would result in "#test" being in the TO field. mailto:?subject=#test would result in "#test" being in the subject field. In short, there's no such thing as a fragment identifier in a mailto URI, according to most clients.

If, for whatever reason, you want to compose a URI that puts just numbers in the TO field, you have to be careful. For example, if you wanted 44 to show up in the TO field, you might use "mailto:44". However, this creates a problem if you enter the URI in a browser's address field. The browser's address field might think that you mean to go to some mailto site on port 44. To avoid this, use "mailto:?to=44" instead. Or, percent-encode the first 4 to %34. This should not be a problem for links in a web page.

Decoding data

  1. Let S be the encoded string to decode.
  2. Let I be the current position in S, starting at 0.
  3. Let RET be the empty string to hold the built value.
  4. Let HEXITS be the string "0123456789ABCDEF".
  5. While I is less than the length of S:

    1. Let C be the character at position I.
    2. If C equals "%" and I + 2 is less than the length of s:

      1. Let F1 be the uppercase version of the character at position I + 1.
      2. Let F2 be the uppercase version of the character at position I + 2.
      3. Let A be the position of F1 in HEXITS.
      4. Let B be the position of F2 in HEXITS.
      5. If F1 or F2 was not found in HEXITS:

        1. Append C to RET.
      6. Else:

        1. Append the character represented by the 8bit unsigned value (0 to 255) of (A * 16 + B) to RET.
        2. Increment I by 2.
    3. Else:

      1. Append C to RET
    4. Increment I by 1
  6. Replace all Carriage Return + Line Feed pairs (0x0D + 0x0A) in RET with Line Feeds (0x0A).
  7. Then, replace all Carriage Returns (0x0D) in RET with Line Feeds(0x0A).

RET will be a string of raw utf-8 sequences with all newline pairs and single newlines normalized to \n.

Handling of duplicate hnames

In a mailto URI, there can be more than one hname with the same name. When a client parses a mailto URI to generate a TO, CC, BCC, SUBJECT or BODY value, the following rules should be followed.

The decoded version of the generated TO value is what browsers should use for the mailto link "Copy email address" feature. "Copy subject", "Copy body", "Copy CC addresses" and "Copy BCC addresses" features should use the decoded version of their corresponding generated value.

If the client supports other address-based hnames, the TO, CC and BCC rules can be used if desired. If the client supports other multiline-based hnames, the body rule can be used. If the client supports other single-line hnames, the subject rule can be used.

Note that array-based hnames like ?body[]=line1&body[]=line2 are not supported in mailto URIs.

Rationale

For TO, CC and BCC (and generally other address types), there is no need to join empty hvalues. You'll end up getting ", , , , ," if you do.

For SUBJECT, since there can be only one subject in a message, make each new SUBJECT hvalue override the previous one. Also, there are existing implementations that already do this.

For BODY, do not join empty BODY hvalues until there is a BODY hvalue with significant content (non-empty). That way, empty body hvalues do not cause a bunch of empty lines at the top of the BODY field before there's significant content.

If you want your mailto URI to be handled correctly in clients that don't support duplicate hnames, don't use duplicate hnames in your URI. Currently, there's not much need to anyway.

Pre-Parsing

Before parsing, the URI needs to be stripped of "mailto:" and converted into a dataset of &-separated hname=hvalue pairs so that they can be split by "&".

  1. Let S be the incoming string.
  2. Let P be an empty string to store the &-separated list of hname=hvalue pairs.
  3. IF S starts with a case-insensitive "mailto:":

    1. Let SUB be the substring of S starting at and including the character at position 4 to the end of S.
    2. Replace the first ':' in SUB with '='.
    3. IF one or more '?' are present in SUB:

      1. Replace only the first '?' with '&'.
    4. Assign SUB to P.
  4. Else:

    1. Handle the error as you desire. (Return an empty string for example.)

Since "mailto:" is considered a TO hname, step #3 will ensure that there's always a TO hvalue in the returned dataset even if the hvalue is empty. If you do not wish this to happen (because you're going to split the datset and store the values in a multimap of some sort for example and want to skip the first TO entry if it's empty), you can use the following steps for #3 instead.

  1. IF S equals a case-insensitive "mailto:":

    1. Return an empty string for the dataset.
  2. Else IF S starts with a case-insenstive "mailto:?":

    1. Return a substring consisting of everything in S after "mailto:?".
  3. Else:

    1. Return "to=" + everything in S after "mailto:" and replace the first "?" found with "&".

Parsing

  1. Let P be the string with the &-separated list of hname=hvalue pairs.
  2. Split P by '&' into array HLIST. (use a method that produces the same result as ECMAScript's split function for example)
  3. Let I be the current position in HLIST starting at 0.
  4. Let TO, SUBJECT, BODY, CC and BCC be empty strings to store the encoded values.
  5. While I is less than the length of HLIST:

    1. Let S be the string at position I in HLIST.
    2. Let EQ be the position of the first '=' in S.
    3. If a '=' is found in S:

      1. Let HNAME be a string representing the decoded version of a substring of S starting at and including the character at position 0 on up to, but not including the character at position EQ.
      2. IF HNAME is empty:

        1. Continue to the next iteration of the loop.
      3. Let LCCHECK be a string representing the lowercase version of HNAME.
      4. IF LCCHECK equals "to" or "cc" or "bcc" or "subject" or "body":

        1. Assign LCCHECK to HNAME.
      5. Let HVALUE be a string representing a substring of S starting at and including the character at position EQ + 1 to the end of S.
      6. If HNAME equals "to":

        1. If HVALUE is not empty:

          1. If TO is not empty:

            1. Append "%2C%20" to TO.
          2. Append HVALUE to TO.
      7. Else if HNAME equals "cc":

        1. If HVALUE is not empty:

          1. If CC is not empty:

            1. Append "%2C%20" to CC.
          2. Append HVALUE to CC.
      8. Else if HNAME equals "bcc":

        1. If HVALUE is not empty:

          1. If BCC is not empty:

            1. Append "%2C%20" to BCC.
          2. Append HVALUE to BCC.
      9. Else if HNAME equals "subject":

        1. Assign HVALUE to SUBJECT.
      10. Else if HNAME equals "body":

        1. If (HVALUE is empty and BODY is empty) equals false:

          1. If BODY is not empty:

            1. Append "%0D%0A" to BODY.
          2. Append HVALUE to BODY.
      11. Else if HNAME equals "another hname you want to support":

        1. Handle multiple instances of HNAME as desired. However, if it's an address type, it is recommended that you parse it like TO, CC and BCC and handle HNAME in a case-insensitive manner.

    4. Increment I by 1.
  6. Let DTO, DSUBJECT, DBODY, DCC and DBCC be strings of raw utf-8 sequences representing the decoded versions of TO, SUBJECT, BODY, CC and BCC respectively.
  7. Optionally, reencode from DTO, DSUBJECT, DBODY, DCC and DBCC to generate normalized TO, SUBJECT, BODY, CC and BCC values where:

The decoded values are what mail clients use (after converting them to the needed encoding) to fill in a compose form's text fields.

Handling null bytes, invalid %HH and unwanted %HH

If your app and or your parser does not support null bytes, before parsing the mailto URI, convert "%00" (and raw null bytes) to "%2500" so that when decoding, they show up in the compose field as "%00". If working with a string of characters with a different type and or width, adjust this normalization accordingly. Use this same type of normalization for other bytes that are not supported and invalid %HH. For example, with "mailto:?subject=%YY", %YY is not a valid %HH. It's really just a % (that should have been encoded to %25) and 2 Y's. %YY should be treated as %25YY so that when it's decoded, it comes out as %YY in the subject field.

Test

mailto link

Click on the link. In your mail client's compose form, you should get the following results.

To:

Cc:

Bcc:

Subject:

Note that there may be extra lines in the body in your mail client if it has signature support.

Also note that for the TO, CC and BCC fields, user-input in most clients requires properly-escaped addresses where certain charactes are escaped with "\" and other parts are quoted (see Address Syntax). However, some clients allow unescaped user-input and will, after decoding an hvalue, deescape characters escaped with "\" when filling in the fields. As long as the intended headers are produced in the outgoing message, the client can use whatever user-input method that's desired.

Note that this test currently does not test the matching of an hname if it's encoded. For example, "mailto:?%74%6F=email%40site.com" is the same as "mailto:?to=email%40site.com". This is covered in the parsing section though where the hname is decoded first before lowercasing and checking for a match.

Implementations of parsing, decoding and encoding

In MailtoURIParserPack.zip are C, C++, D, Java, Javascript, Perl, Python, Ruby, Pike, Lua, Tcl, PHP5 and Python3000 MailtoURIParser classes that use the rules in this document to show examples of parsing. The test above uses a Javascript version to parse the mailto link to generate the form data.

Mozilla Thunderbird 3.0 generally parses mailto URIs according to the rules in this document (keeping the deescaping note for the TO, CC and BCC fields in mind). Thunderbird 3.0 passes the test above.

Opera 9.5 generally parses mailto URIs for its built-in mail client according to the rules in this document. It passes the test above.

An Opera UserJS mailto link handler that opens mailto links in various webmails.

HTML forms and the action attribute

The action attribute value of an HTML form can be any valid mailto URI including just "mailto:". When the form is submitted, how the URI is generated before being passed to the mail client depends on the form submission method.

Mailto URIs are NOT of type application/x-www-form-urlencoded and mail clients don't decode + to a space. Spaces in the generated mailto URI must be represented as %20 and NOT +.

GET

When submitting to the mail client with method="get", "mailto:?" plus the encoded data set for the form should be generated. Existing hvalues in the mailto URI will be ignored.

For example, the following form would generate and submit "mailto:?to=to1%40example.com%2C%20to2%40example.com&to=to3%40example.com%2C%20to4%40example.com&cc=cc1%40example.com%2C%20cc2%40example.com&cc=cc3%40example.com%2C%20cc4%40example.com&bcc=bcc1%40example.com%2C%20bcc2%40example.com&bcc=bcc3%40example.com%2C%20bcc4%40example.com&subject=subject%201&subject=subject%202&body=Line%201%0D%0ALine%202&body=Line%203%0D%0ALine%204" to the mail client.

<form action="mailto:?body=This%20hvalue%20will%20be%20removed." method="get">
    <p>
        <input type="text" name="to" value="to1@example.com, to2@example.com">
        <input type="text" name="to" value="to3@example.com, to4@example.com">
        <input type="text" name="cc" value="cc1@example.com, cc2@example.com">
        <input type="text" name="cc" value="cc3@example.com, cc4@example.com">
        <input type="text" name="bcc" value="bcc1@example.com, bcc2@example.com">
        <input type="text" name="bcc" value="bcc3@example.com, bcc4@example.com">
        <input type="text" name="subject" value="subject 1">
        <input type="text" name="subject" value="subject 2">
        <textarea name="body">Line 1
Line 2</textarea>
        <textarea name="body">Line 3
Line 4</textarea>
        <input type="submit" value="Compose">
    </p>
</form>

So, for method="get", action="mailto:" is what you should use.

POST

When submitting to the mail client with method="post", the encoded data set is itself encoded and used as the BODY hvalue of the generated mailto URI. A Subject hvalue is also generated that contains the User Agent string, which may be overridden by the last Subject hvalue in the URI if one is present. Also, since this is POST, if there are any existing hvalues in the mailto URI, they are kept.

For example, in the following form, "mailto:?body=This%20hvalue%20will%20NOT%20be%20removed.&subject=Form%20Post%20from%20Opera%2F9.50%20(Windows%20NT%205.1%3B%20U%3B%20en)&body=body%3DJust%2520a%2520test." will be generated.

<form action="mailto:?body=This%20hvalue%20will%20NOT%20be%20removed." method="post">
    <p>
        <input type="text" name="body" value="Just a test.">
        <input type="submit" value="Compose">
    </p>
</form>

Also see JS parsing examples and createMailtoURIFromEnabledFormControls() in BeforeMailtoURL.js for details.

action="mailto:" is partially broken (at different levels) in Opera, Safari, Firefox and IE. Firefox has the best support though with its only problem being that it encodes spaces as +.

Resolving mailto links

When browsers resolve mailto URIs during markup parsing or input submission, raw white-space and non-ascii characters (even if the URI is represented with a non-ascii string) should be UTF8-Percent-encoded to %HH. For example, after entering "mailto:√?subject=1 2√" into a browser's address field and pressing enter, it should be resolved to "mailto:%E2%88%9A?1%202%E2%88%9A". It is very important that this happens before passing a mailto URI on the command line. The presence of raw quotes, backslashes and wide characters can split up commands and execute arbitrary programs (even if the URI is quoted before passing).

Also, when browsers resolve, if a mailto URI contains a %HH sequence representing a character that is not reserved, the %HH sequence should be left alone (at least in places like the address field where a user might be manually editing the URI, or for links when the user might "Copy link address"). It should not be decoded just because it doesn't need to be encoded. For example, "mailto:%2E?subject=%2E" should not be resolved to "mailto:.?subject=.". In this case, it should be as the author of the link intended, if possible.

Also, a browser's "Copy link address" option should copy the mailto link to the clipboard in it's fully-encoded, resolved state.

Also, by default, browsers, should display mailto links in status bars and tooltips in their fully-encoded, resolved state. mailto URIs can lose their meaning when you show them in decoded form and it's no help to the user. The user will get to see everything in decoded format during the compose preview. However, if there's a way for the user to distinguish between a separator and decoded text that looks like a separator (by changing the color of the real separator for example), showing the URI in decoded form may be O.K.

A browser's "Send link by mail" feature should general work like this Javascript example:

function UTF8PercentEncodeWithNormalizedNewlines(s) {
    try {
        // Normalize raw newlines first so that *if* there are any newlines 
        // in s, \r\n, stray \r and \n all come out as %0D%0A.
        return encodeURIComponent(s.replace(/\r\n|\r|\n/g, "\r\n"));
    } catch (e) {
        return "Error%20encoding%20data.";
    }
}

function UTF8PercentEncodeWithNewlinesStripped(s) {
    try {
        return encodeURIComponent(s.replace(/\r|\n/g, ""));
    } catch (e) {
        return "Error%20encoding%20data.";
    }
}

function sendLinkByMail() {
    var subject = UTF8PercentEncodeWithNewlinesStripped(document.title);
    var body = UTF8PercentEncodeWithNormalizedNewlines("<" + document.location + ">");
    var uri = "mailto:?subject=" + subject + "&body=" + body;
    window.open(uri);
}

// For Thunderbird 2 (not 3) with HTML composition turned on for example

function sendLinkByMailBodyIsHTML() {
    var subject = UTF8PercentEncodeWithNewlinesStripped(document.title);
    var body = UTF8PercentEncodeWithNormalizedNewlines("&lt;" + document.location + "&gt;");
    var uri = "mailto:?subject=" + subject + "&body=" + body;
    window.open(uri);
}

// For Thunderbird 2 (not 3) with HTML composition turned on for example

function sendLinkByMailBodyIsHTMLActualLink() {
    var subject = UTF8PercentEncodeWithNewlinesStripped(document.title);
    var link = '<a href="' + document.location + '">' + document.location + '</a>';
    var body = UTF8PercentEncodeWithNormalizedNewlines(link);
    var uri = "mailto:?subject=" + subject + "&body=" + body;
    window.open(uri);
}

Contact

Michael A. Puls II