Modern Mailto URI Scheme

Abstract

This draft document defines the format and handling of mailto URIs. It is a standalone specification and does not rely on other specifications.

Table of Contents

Introduction

Mailto URIs are used to specify email message header data. Mail clients use this data to generate default header values and default compose form text field values.

Syntax

"mailto:?" + "&-separated list of hname=hvalue pairs"

All hname=hvalue pairs are optional.

"mailto:" and hnames are handled in a case-insensitive manner. Use lowercase when possible though.

In HTML and XML markup, "&" must be represented as "&".

The basic hnames

TO, SUBJECT, BODY, CC and BCC are the basic hnames that should be supported.

Everything after "mailto:" on up to (but not including) "?" (or the end of the URI if a "?" is not present) also represents a TO hvalue (where "mailto:" represents the TO hname for the value).

Definitions for the basic hvalues

TO
A properly-escaped, encoded, comma-separated list of addresses to send a copy of the message to (with the knowledge of other recipients), where the recipients are expected to take part in the discussion.
CC
A properly-escaped, encoded, comma-separated list of addresses to send a copy of the message to (with the knowledge of other recipients), where the recipients are at least expected to read the discussion.
BCC
A properly-escaped, encoded, comma-separated list of addresses to send a copy of the message to (without the knowledge of other recipients), where the recipients are at least expected to read the discussion.
SUBJECT
The subject of the discussion, encoded.
BODY
An encoded, \r\n-separated list of lines representing the body content of the message.

"Comma-separated" refers to separating by a comma and a space. For example: "1, 2, 3, 4, 5" and NOT "1,2,3,4,5". In the case of an hvalue, each comma is encoded as %2C and each space is encoded as %20.

"\r\n-separated" refers to separating by a carriage return (0x0D) and a line feed (0x0A). In the case of an hvalue, each carriage return is encoded as %0D and each line feed is encoded as %0A. For example, "line1%0D%0Aline2".

Address syntax

Address examples (in unencoded form)

(Need to do more than examples)

  • tim@example.com
  • Tim <tim@example.com>
  • "Tim Jones" <tim@example.com>
  • "Timothy \"The man\" Jones" <tim@example.com>
  • "\\ is a backslash" <tim@example.com>
  • "That \"is\" cool" <cool@example.com> (not really)
  • test <tim@example.com> (let us try (nested) comments)
  • x+y@example.com

URI Examples

  • mailto:
  • mailto:?
  • mailto:?to=email%40example.com
  • mailto:?subject=mailto%20uri%20scheme
  • mailto:?body=line1%0D%0Aline2
  • mailto:?cc=email%40example.com
  • mailto:?bcc=email%40example.com
  • mailto:?to=email%40example.com&subject=mailto%20uri%20scheme&body=line1%0D%0Aline2&cc=email%40example.com&bcc=email%40example.com
  • mailto:?to=email1%40example.com%2C%20email2%40example.com%2C%20email3%40example.com
  • mailto:?cc=email1%40example.com%2C%20email2%40example.com%2C%20email3%40example.com
  • mailto:?bcc=email1%40example.com%2C%20email2%40example.com%2C%20email3%40example.com
  • mailto:email%40example.com
  • mailto:email1%40example.com%2C%20email2%40example.com%2C%20email3%40example.com
  • mailto:email%40example.com?subject=mailto%20uri%20scheme&body=line1%0D%0Aline2&cc=email%40example.com&bcc=email%40example.com
  • mailto:?body=line1&body=line2&body=line3
  • mailto:?body=&body=&body=&body=line1&body=line2
  • mailto:?body=&body=&body=&body=line1&body=&body=line3
  • mailto:?body=&body=&body=&body=line1&body=&body=line3&body=&body=&body=
  • mailto:?subject=not%20used&subject=not%20used&subject=used
  • mailto:email1%40example.com?to=email2%40example.com&to=email3%40example.com
  • mailto:?cc=email1%40example.com&cc=email2%40example.com&cc=email3%40example.com
  • mailto:?bcc=email1%40example.com&bcc=email2%40example.com&bcc=email3%40example.com
  • mailto:?%E2%88%9A=%E2%88%9A
  • mailto:?subject=1%2B2%3D3
  • mailto:?subject=raining%20cats%20%26%20dogs
  • mailto:Tim%20%3Ctim%40example.com%3E
  • mailto:%22Tim%20Jones%22%20%3Ctim%40example.com%3E
  • mailto:%22Timothy%20%5C%22The%20man%5C%22%20Jones%22%20%3Ctim%40example.com%3E
  • mailto:%22%5C%5C%20is%20a%20backslash%22%20%3Ctim%40example.com%3E
  • mailto:%22That%20%5C%22is%5C%22%20cool%22%20%3Ccool%40example.com%3E%20(not%20really)
  • mailto:test%20%3Ctim%40example.com%3E%20(let%20us%20try%20(nested)%20comments)
  • mailto:x%2By%40example.com

Note about "string"

"string" in the following encoding, decoding and parsing steps refers to a sequence of 8bit unsigned characters. If working with a string of characters with a different type and or width, adjust the steps to conform.

Encoding data

Hvalues and hnames need to be percent-encoded.

  1. Let S be a string of raw utf-8 sequences.
  2. Replace all Carriage Return + Line Feed pairs (0x0D + 0x0A) in S with Line Feeds (0x0A).
  3. Replace all Carriage Returns (0x0D) in S with Line Feeds (0x0A).
  4. Let NOENCODE be the string "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.!~*'()".
  5. Let HEXITS be the string "0123456789ABCDEF".
  6. Let I be the current position in S starting at 0.
  7. Let RET be the empty string to hold the built value.
  8. While I is less than the length of S:

    1. Let C be the character in S at position I.
    2. If C equals a Line Feed (0x0A):

      1. Append "%0D%0A" to RET.
    3. Else if C is found in NOENCODE:

      1. Append C to RET.
    4. Else:

      1. Append "%" to RET.
      2. Let CI be the 8bit unsigned value (0 to 255) that represents C.
      3. Let F1 be the value of CI >> 4.
      4. Let F2 be the value of CI & 0x0F.
      5. Let A be the character at position F1 in HEXITS.
      6. Let B be the character at position F2 in HEXITS.
      7. Append A to RET.
      8. Append B to RET.
    5. Increment I by 1.

All characters in an hvalue or hname that need to be encoded should be. An "@" when used as a separator (as in email@example.com), still needs to be encoded to %40. It being a separator in this case, does not exempt it from being encoded. Same goes for "+". Even though it represents itself (instead of a space) in a mailto URI, it is not exempt from being encoded to "%2B". (Although mail clients still need to handle "@" and "+" the same even if they're not encoded.)

Just like hvalues, hnames can contain full utf-8 sequences. This allows clients to support unicode hnames if needed. For example, mailto:?%E2%88%9A=%E2%88%9A is valid in a mailto URI.

Decoding data

  1. Let S be the encoded string to decode.
  2. Let I be the current position in S, starting at 0.
  3. Let RET be the empty string to hold the built value.
  4. Let HEXITS be the string "0123456789ABCDEF".
  5. While I is less than the length of S:

    1. Let C be the character at position I.
    2. If C equals "%" and I + 2 is less than the length of s:

      1. Let F1 be the uppercase version of the character at position I + 1.
      2. Let F2 be the uppercase version of the character at position I + 2.
      3. Let A be the position of F1 in HEXITS.
      4. Let B be the position of F2 in HEXITS.
      5. If F1 or F2 was not found in HEXITS:

        1. Append C to RET.
      6. Else:

        1. Append the character represented by the 8bit unsigned value (0 to 255) of (A * 16 + B) to RET.
        2. Increment I by 2.
    3. Else:

      1. Append C to RET
    4. Increment I by 1
  6. Replace all Carriage Return + Line Feed pairs (0x0D + 0x0A) in RET with Line Feeds (0x0A).
  7. Then, replace all Carriage Returns (0x0D) in RET with Line Feeds(0x0A).

RET will be a string of raw utf-8 sequences with all newline pairs and single newlines normalized to \n.

Handling of duplicate hnames

In a mailto URI, there can be more than one hname with the same name. When a client parses a mailto URI to generate a TO, CC, BCC, SUBJECT or BODY value, the following rules should be followed.

The decoded version of the generated TO value is what browsers should use for the mailto link "Copy email address" feature. "Copy subject", "Copy body", "Copy CC addresses" and "Copy BCC addresses" features should use the decoded version of their corresponding generated value.

If the client supports other address-based hnames, the TO, CC and BCC rules can be used if desired.

Duplicate hnames were generally allowed for use with clients that had hvalue length restrictions, but still wanted to support acceptence of large amounts of data for an hname. Even clients that don't have hvalue length restrictions should support duplicate hnames for compatibility.

Rationale

For TO, CC and BCC (and generally other address types), there is no need to join empty hvalues. You'll end up getting ", , , , ," if you do.

For SUBJECT, since there can be only one subject in a message, make each new SUBJECT hvalue override the previous one. Also, there are existing implementations that already do this.

For BODY, do not join empty BODY hvalues until there is a BODY hvalue with significant content (non-empty). That way, empty body hvalues do not cause a bunch of empty lines at the top of the BODY field before there's significant content.

If you want your mailto URI to be handled correctly in clients that don't support duplicate hnames, don't use duplicate hnames in your URI. Currently, there's not much need to anyway.

Pre-Parsing

  1. Let S be the incoming string.
  2. Let P be an empty string to store the &-separated list of hname=hvalue pairs.
  3. If S starts with a case-insensitive "mailto:":

    1. Let SUB be the substring of S starting at and including the character at position 4 to the end of S.
    2. Replace the first ':' in SUB with '='.
    3. If one or more '?' are present in SUB:

      1. Replace only the first '?' with '&'.
    4. Assign SUB to P.
  4. Else:

    1. Fail.

Parsing

  1. Let P be the string with the &-separated list of hname=hvalue pairs.
  2. Split P by '&' into array HLIST. (need to clarify 'split' exactly)
  3. Let I be the current position in HLIST starting at 0.
  4. Let TO, SUBJECT, BODY, CC and BCC be empty strings to store the encoded values.
  5. While I is less than the length of HLIST:

    1. Let S be the string at position I in HLIST.
    2. Let EQ be the position of the first '=' in S.
    3. If a '=' is found in S:

      1. Let HNAME be a string representing the lowercase version of a substring of S starting at and including the character at position 0 on up to, but not including the character at position EQ.
      2. Let HVALUE be a string representing a substring of S starting at and including the character at position EQ + 1 to the end of S.
      3. If HNAME equals "to":

        1. If HVALUE is not empty:

          1. If TO is not empty:

            1. Append "%2C%20" to TO.
          2. Append HVALUE to TO.
      4. Else if HNAME equals "cc":

        1. If HVALUE is not empty:

          1. If CC is not empty:

            1. Append "%2C%20" to CC.
          2. Append HVALUE to CC.
      5. Else if HNAME equals "bcc":

        1. If HVALUE is not empty:

          1. If BCC is not empty:

            1. Append "%2C%20" to BCC.
          2. Append HVALUE to BCC.
      6. Else if HNAME equals "subject":

        1. Assign HVALUE to SUBJECT.
      7. Else if HNAME equals "body":

        1. If (HVALUE is empty and BODY is empty) equals false:

          1. If BODY is not empty:

            1. Append "%0D%0A" to BODY.
          2. Append HVALUE to BODY.
      8. Else if HNAME equals "another hname you want to support":

        1. Parse it as desired. However, if it's an address type, it is recommended that you parse it like TO, CC and BCC.

    4. Increment I by 1.
  6. Let DTO, DSUBJECT, DBODY, DCC and DBCC be strings of raw utf-8 sequences representing the decoded versions of TO, SUBJECT, BODY, CC and BCC respectively.
  7. Optionally, reencode from DTO, DSUBJECT, DBODY, DCC and DBCC to generate normalized TO, SUBJECT, BODY, CC and BCC values where:
    • Characters that were encoded, but don't need to be, are not encoded.
    • Characters that were not encoded, but need to be, are encoded.
    • There are no stray %0D or %0A and only %0D%0A.

The decoded values are what mail clients use (after converting them to the needed encoding) to fill in a compose form's text fields.

Handling null bytes

If your app and or your parser does not support null bytes, before parsing the mailto URI, convert "%00" (and raw null bytes) to "%2500" so that when decoding, they show up in the compose field as "%00". If working with a string of characters with a different type and or width, adjust this normalization accordingly. Use this same type of normalization for other bytes that are not supported.

Test

mailto link

Click on the link. In your mail client's compose form, you should get the following results.

To:

Cc:

Bcc:

Subject:

Note that there may be extra lines in the body in your mail client if it has signature support.

Also note that for the TO, CC and BCC fields, user-input in most clients requires properly-escaped addresses where certain charactes are escaped with "\" and other parts are quoted (see Address Syntax). However, some clients allow unescaped user-input and will, after decoding an hvalue, deescape characters escaped with "\" when filling in the fields. As long as the intended headers are produced in the outgoing message, the client can use whatever user-input method that's desired.

Implementations of parsing, decoding and encoding

In MailtoURIParserPack.zip are C, C++, D, Java, Javascript, Perl, Python, Ruby, Pike, Lua, Tcl, PHP5 and Python3000 MailtoURIParser classes that use the rules in this document. The test above uses the Javascript version to parse the mailto link to generate the form data.

Mozilla Thunderbird 3.0 generally parses mailto URIs according to the rules in this document (keeping the deescaping note for the TO, CC and BCC fields in mind). Thunderbird 3.0 passes the test above (if clicking on the mailto link in Firefox with Thunderbird 3.0 as the default mail client).

Opera 9.5 generally parses mailto URIs for its built-in mail client according to the rules in this document (except for a current empty body value quirk).

Contact

Michael A. Puls II