The 'mailto' URI Scheme

Revision 5 - September 26, 2010

Abstract

This draft document defines the syntax, generation and handling of mailto URIs (and IRIs where applicable).

Table of Contents

  1. Introduction
  2. Authoring

    1. Syntax
    2. Authoring Rules
    3. Examples of Encoding and Generating
    4. Authoring Tools
  3. Consuming
  4. How to create a "Send page link by mail" URI
  5. Note on passing mailto URIs from one UA to another
  6. Note on converting mailto IRIs to URIs abd resolving
  7. HTML Form Handling

Introduction

Mailto URIs are used to specify email message compose data in a portable format. Mail clients parse and decode this data to generate default header field values and message bodies for compose forms and message sources.

Authoring

For producers of mailto URIs

Syntax

By example:

Authoring Rules

  • The to_hfvalue is optional.
  • hfields (hfname=hfvalue) are optional.
  • "mailto:" MUST be lowercase
  • All hfnames MUST be lowercase.
  • hfnames must be unique. There MUST NOT be duplicate hfnames. This also means that if you specify a to_hfvalue, then you MUST NOT include a "to" hfname. A "to" hfname SHOULD ONLY be generated by an HTML form. Other generators SHOULD emit a to_hfvalue.
  • Except for to_hfvalue, hfnames and hfvalues MUST NOT be empty.
  • IF there are no hfields, there MUST NOT be any '?' or "&" or "=".
  • IF there are hfields present, the number of "&" MUST be the number of hfields - 1.
  • In an hfield, there MUST ONLY be one "=".
  • There MUST ONLY ever be one '?' present in a URI.
  • '#' MUST NOT be present anywhere in a mailto URI. (percent-encode it as %23 instead)
  • IF a mailto URI contains no hfields, the to_hfvalue MUST NOT contain ONLY integers. For example, "mailto:8080".
  • The to_hfvalue, hfnames and hfvalues MUST be generated by percent-encoding all characters in the source value that are NOT "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.!~*'()". Characters in the list MUST NOT be percent-encoded. This also applies to the HTML form action attribute value where ' ' (a space, \x20) MUST be emitted as "%20" and '+' MUST be emitted as "%2B".
  • Before a value can be percent-encoded, characters in the range of \x00 through \x08 and \x0B through \x0C and \x0E through \x1F must be removed as they're unsafe.
  • Before a value can be percent-encoded, stray \x0D and stray \x0A must be converted to \x0D\x0A.
  • %HH MUST be in uppercase.
  • There MUST NOT be any invalid %HH or stray "%".
  • %HH MUST represent valid UTF-8 sequences.
  • The to_hfvalue and hfvalues for hfnames of "to", "cc", "bcc" and "subject" SHOULD NOT contain \x0D or \x0A. This SHOULD apply to other hfvalues that are known to be for single-line headers/fields.
  • The to_hfvalue and hfvalues for hfnames of "to", "cc" and "bcc", if utf-8-percent-decoded, SHOULD represent valid RFC5322 header field bodies that represent addr-spec/addr-spec group lists. This SHOULD apply to other hfvalues that are known to be for headers/fields that contain RFC5322 addr-specs.
  • hfnames and hfvalues must be UTF-8-based (as in UTF-8 sequennces or %HH representing UTF-8 sequences).
  • hfvalues, once utf-8-percent-decoded SHOULD represent valid RFC5322 header field bodies (except for the "body" hfvalue which, after being utf-8-percent-decoded, SHOULD represent a valid RFC5322 message body).
  • hfnames, once utf-8-percent-decoded SHOULD represent valid RFC5322 header field names.
  • mailto URIs in public documents SHOULD NOT contain "bcc" hfvalues.
  • If a mailto URI is included in HTML or XML markup, "&" SHOULD be escaped as "&".

Examples of Encoding and Generating

Here's an ECMAScript example of how to generate an hfname or hfvalue from a string while using the rules above:

// Characters that are NOT percent-encoded: "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.!~*'()"

function encode(s) {
    return encodeURIComponent(s.replace(/[\x00-\x08]|[\x0B-\x0C]|[\x0E-\x1F]/g, "").replace(/\r\n|\r|\n/g, "\r\n"));
}
function stripNewlines(s) {
    return s.replace(/\r|\n/g, "");
}
var uri = "mailto:".toLowerCase();
uri += encode(stripNewlines("to1@example.com, to2@example.com"));
uri += "?";
uri += encode("subject".toLowerCase());
uri += "=";
uri += encode(stripNewlines("mailto URIs are fun!"));
uri += "&";
uri += encode("body".toLowerCase());
uri += "=";
uri += encode("line1\r\nline2");
uri += "&";
uri += encode("cc".toLowerCase());
uri += "=";
uri += encode(stripNewlines("cc1@example.com, cc2@example.com"));
uri += "&";
uri += encode("bcc".toLowerCase());
uri += "=";
uri += encode(stripNewlines("bcc1@example.com, bcc2@example.com"));
alert(uri);

Authoring Tools

mailto URI Composer

mailto URI Validator

Consuming

For consumers of mailto URIs

Authoring mailto URIs is quite simple. However, handling them is more complex. You need to convert the URI to a dataset, parse it and percent-decode it to get the raw data. But, you need to do this in a consistent way for all types of mailto URIs including invalid ones.

Is it a mailto URI?

You have a mailto URI if the string in question starts with a case-insensitive "mailto:". "MAILTO:", "mAiLtO:", "MaIlTo:" and "mailto:" are all mailto URIs.

Make the URI data safe for processing and fix percent-encoding

There are a few things you need to do with the URI before you can process it.

  1. Codepoints \x00 through \x08, \x0B through \x0C and \x0E through \x1F are unsafe. Convert them to %HH so that there are no unsafe raw characters in the string. It is especially important to convert \x00 in null-terminated strings.
  2. Convert invalid %HH to %25HH. For example, %3y is not a valid %HH. The %HH contains a character that is not a case-insensitive "0123456789ABCDEF". You need to convert it to %253y so that strict percent-decoders that throw an error on invalid %HH can decode it to "%3y".
  3. Convert %HH representing the unsafe codepoints above to %25HH. For example, "%00" is unsafe. You convert it to "%2500" so that when it's percent-decoded, it comes out as "%00". This makes the percent-decoder treat unafe %HH literally.
  4. Convert, stray \r, stray \n, stray %0D (case-insensitive) and stray %0A (case-insensitive) to "%0D%0A".
  5. Convert all "+" to "%2B". You do this just in case the percent-decoding function you use later decodes "+" to a space.

Here's a Javascript example:

function escapeUnsafeRaw(s) {
    return s.replace(/[\x00-\x08]|[\x0B-\x0C]|[\x0E-\x0F]/g, function(match) {
        try {
            return encodeURIComponent(match);
        } catch (e) {
            return "";
        }
    });
}
function escapeInvalidHH(s) {
    return s.replace(/%(?![0-9A-F]{2})/gi, function() {
        return "%25";
    });
}
function normalizeNewlines(s) {
    return s.replace(\r\n|\r|\n/g, "%0D%0A").replace(/%0D%0A|%0D|%0A/gi, "%0D%0A");
}
function escapeUnsafeHH() {
    return s.replace(/%(00|01|02|03|04|05|06|07|08|0B|0C|0E|0F|10|11|12|13|14|15|16|17|18|19|1A|1B|1C|1D|1E|1F)/gi, function(match, hh) {
        "%25" + hh;
    });
}
function escapePlus(s) {
    return s.replace(/\+/g, "%2B");
}

function makeURISafe(s) {
    return normalizeNewlines(escapePlus(escapeUnsafeHH(escapeInvalidHH(escapeUnsafeRaw(s)))));
}
var uri = "mailto:\0%00\n\r\n\r%3y%5e%0A%0D%0A%0D+";
alert(makeURISafe(uri)); // "mailto:%2500%2500%0D%0A%0D%0A%0D%0A%253y%5e%0D%0A%0D%0A%0D%0A%2B"

Converting the URI to a dataset

Before you can process the mailto URI, you need to convert it into an &-separated list of hfname=hfvalue pairs.

This is done in a few steps:

  1. Strip everything from and including the first '#' to the end of the URI. This will get rid of the fragment identifier if there is one. Since mailto URIs currently are not supposed to have fragment identifiers, it's best to strip them to discourage use of them. That way, in the future, if fragment identifiers in mailto URIs are used for something, there won't be any legacy (and perhaps incorrect) fragment identifier handling to support.
  2. Replace 'mailto:' with "to=".
  3. Convert all & before "?" (or the end of the string if there isn't a "?") to %26. This is needed so splitting by & later works correctly.
  4. Convert the first "?" to "&". You only convert the first "?" because any extra are considered invalid and will just be as part of an hfname or hfvalue.

Here's a Javascript example:

var uri = "mailto:&&&foo?x=1&y=2?#x#y#z";

var dataset = "to=" + uri.replace(/#.+/, "").substr(7).replace(/^[^?]+/, function(match) {
    return match.replace(/&/g, "%26");
}).replace(/\?/, "&");

alert(dataset); // "to=%26%26%26foo&x=1&y=2?"

Splitting the dataset

Once you have a dataset to work with, you need to split it by "&" into a bunch of tokens. Each token will then represent an hfield.

Here's a Javascript example:

var dataset = "to=%26%26%26foo&x=1&y=2?";
var hfields = dataset.split("&");
for (var i = 0; i < hfields.length; ++i) {
    if (hfields[i].indexOf("=") !== -1) {
        alert(hfields[i]);
        // "to=%26%26%26foo"
        // "x=1"
        // "x=2?"
    }
}

As you can see, you skip hfield values that don't have any "=" in them. If there's no "=" in the string, there's no hfname or hfvalue.

Splitting an hfield

To get the hfname and hfvalue from an hfield, you need to split it by the first "=". You split it by the first "=" so that any extra "=" will be treated as part of the hfvalue.

Here's a Javascript example:

var hfield = "x==1";
var eq = hfield.indexOf("=");
var hfname = hfield.substring(0, eq);
var hfvalue = hfield.substr(eq + 1);
alert(hfname); // "x"
alert(hfvalue); // "=1"

UTF-8-Percent-decoding an hfname or hfvalue

Percent-encoding just involves converting %HH to their raw codepoints.

Here's a Javascript example:

function decode(s) {
    try {
        return decodeURIComponent(s);
    } catch (e) {
        return "";
    }
}
alert(decode("%5E%E2%88%9A")); // "^√" or "^\u221A";

Note that '+' is NOT decoded to a space. It is left alone.

Also note that if you didn't Make the URI safe for processing and fix percent-encoding, you'll have to do these things here.

Converting an hfname to a name

An hfname is a UTF-8-percent-encoded representation of a raw name value. To convert an hfname to a name value, you UTF-8-percent-decode it and then convert it to lowercase.

Converting an hfvalue to a value

An hfvalue is a UTF-8-percent-encoded representation of a raw value. To convert an hfvalue to a value, you just UTF-8-percent-decode it.

Special processing for certain value types

Values for the names "to", "cc", "bcc", and "subject" are considered single-line fields. \r and \n should be stripped from them.

For example, if you had "mailto:line1%0D%0Aline2", the value for "to" would be "line1line2".

This SHOULD apply to all other values that are known to be single-line values.

Here's a Javascript example:

alert("line1\r\nline2".replace(/\r|\n/g, "")); // "line1line2"

Accumulating values for duplicate names

You might find that an author of a mailto URI specified duplicate hfvalues. For example, as a consumer of a mailto URI, you might want to handle "mailto:?cc=1&cc=2" as the author intended.

There are 4 different types of accumulation:

Values for "to", "cc" and "bcc" and other values that are known to carry addresses are of the address type.

Values for "body" and other values that are known to contain multiple lines are multi-line.

Values for "subject" and other values that should only contain a single line are single-line.

Values of unknown type should follow the standard rule.

Now, if you do not want to do any accumulation, use the standard rule for all values. But, still honor special processing for certain values types when it applies.

Full parsing example

Here is a full parsing example in Javascript:

// Example
See mailto_uri_parser.js for now.
                

Note on passing mailto URIs from one UA to another

When browsers, for example, pass a mailto URI (from the address field or an HTML link) to another UA (a mail client for example), if the mailto URI contains a fragment identifier, the browser MUST NOT strip it when passing to the mail client. It is up to the mail client to properly discard of the fragment identifier, not the browser. One reason for this is that the browser is not the consumer. It's just passing the URI along (after normalizing it, escaping it and quoting it so that it's safe to pass on the command line) to the client. Another reason for this is that in the future, if mailto URIs are defined to make use of fragment identifiers and mail clients start supporting them, browsers won't have to update their passing code and the full URI, including the fragment identifier will make it to the client.

Note on converting mailto IRIs to URIs and resolving

When a browser resolves a mailto IRI (in an HTML href attribute for example) or when a browser goes to pass a mailto IRI to another client, or when a user copies a mailto IRI to the clipboard, the IRI MUST be converted to a mailto URI using UTF-8, regardless of the page's encoding. This is important because most clients (including webmails) expect only UTF-8-based mailto URIs.

For example, even in a Shift-JIS page, an href attribute value of "mailto:?subject=√" must be resolved to "mailto:?subject=%E2%88%9A". Further, an href attribute value of "mailto:?subject=%E2%88%9A" in a Shift-JIS page must still resolve to "mailto:?subject=%E2%88%9A".

Note however that some browsers will show IRIs in the status field and address bar instead of URIs. The href property for links and the location property for documents should still return URI values though.

HTML form handling

HTML form submission (when the action attribute starts with 'mailto:') is specified in HTML5 under the Form Submision Algorithm (see the table under step 15), Mail with Headers and Mail as Body sections.

The following is a Javascript example of how the outgoing mailto URI is generated for POST when enctype is "application/x-www-form-urlencoded", and for all other methods regardless of enctype. (Handling of POST with other enctype values is not shown in the example.)

HTMLFormElement.prototype.createMailtoURIFromFormData = function() {
    var destination = "mailto:";
    if (this.action.search(/mailto:/i) === 0) {
        var headers = this.createDatasetFromActiveFormElements();
        destination = this.action;
        // Avoid ambiguous use of +. (Not in HTML5).
        destination = destination.replace(/\+/g, "%2B");
        var qm = destination.indexOf('?');
        if (this.method === "post") {
            var body = encodeURIComponent(headers);
            if (qm === -1) {
                destination += '?';
            } else {
                destination += '&';
            }
            destination += "body=";
            destination += body;
        } else {
            if (qm !== -1) {
                destination = destination.substring(0, qm);
            }
            destination += '?';
            destination += headers.replace(/\+/g, "%20");
        }
    }
    return destination;
};

Note that 'createDatasetFromActiveFormElements' above is an example representing the dataset you get by running the HTML5 form submission algorithm.

Also, before submitting a mailto form, the UA SHOULD present the user with a confirmation dialog as the submission will usually launch an external program.

Contact

Michael A. Puls II