This document defines the format and handling of mailto URIs. It is a standalone document and does not rely on other specifications.
Introduction
Mailto URIs are used to specify email message header data. Mail clients use this data to generate default header values and default compose form text field values.
A string equaling: "mailto:" + "TO hvalue" + "?" + "&-separated list of hname=hvalue pairs"
TO, SUBJECT, BODY, CC and BCC are the basic hnames that should be supported.
Everything after "mailto:" on up to (but not including) "?" (or the end of the URI if a "?" is not present) also represents a TO hvalue (where "mailto:" represents a TO hname).
All hname=hvalue pairs are optional.
"mailto:" and hnames are handled in a case-insensitive manner.
In HTML and XML markup, "&" must be represented as "&".
Note: "string" in this document refers to a sequence of 8bit unsigned characters. For the encoding, decoding and parsing steps, when dealing with strings with characters of a different type and or width, adjust the steps as necessary to conform.
Replace all Carriage Return + Line Feed pairs (0x0D + 0x0A) in S with Line Feeds (0x0A).
Replace all Carriage Returns (0x0D) in S with Line Feeds (0x0A).
Let NOENCODE be the string "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.!*()".
Let HEXITS be the string "0123456789ABCDEF".
Let I be the current position in S starting at 0.
Let RET be the empty string to hold the built value.
While I is less than the length of S:
Let C be the character in S at position I.
If C equals a Line Feed (0x0A):
Append "%0D%0A" to RET.
Else if C is found in NOENCODE:
Append C to RET.
Else:
Append "%" to RET.
Let CI be the 8bit unsigned value (0 to 255) that represents C.
Let F1 be the value of CI >> 4.
Let F2 be the value of CI & 0x0F.
Let A be the character at position F1 in HEXITS.
Let B be the character at position F2 in HEXITS.
Append A to RET.
Append B to RET.
Increment I by 1.
A "+" in a mailto URI does not represent a space. It represents itself and does not need to be encoded for most clients to treat it correctly. However, "+" should be encoded to "%2B" whenever possible because the URI data might end up being used in a http webmail compose URI where if it's not encoded, it will be decoded to a space. This also means that browsers should generate spaces as "%20" instead of "+" for forms that use action="mailto:".
Just like hvalues, hnames can contain full utf-8 sequences. This allows clients to support unicode hnames if needed. For example, mailto:?%E2%88%9A=%E2%88%9A is valid in a mailto URI.
Decoding data
Let S be the encoded string to decode.
Let I be the current position in S, starting at 0.
Let RET be the empty string to hold the built value.
Let HEXITS be the string "0123456789ABCDEF".
While I is less than the length of S:
Let C be the character at position I.
If C equals "%" and I + 2 is less than the length of s:
Let F1 be the uppercase version of the character at position I + 1.
Let F2 be the uppercase version of the character at position I + 2.
Let A be the position of F1 in HEXITS.
Let B be the position of F2 in HEXITS.
If F1 or F2 was not found in HEXITS:
Append C to RET.
Else:
Append the character represented by the 8bit unsigned value (0 to 255) of (A * 16 + B) to RET.
Increment I by 2.
Else:
Append C to RET
Increment I by 1
Replace all Carriage Return + Line Feed pairs (0x0D + 0x0A) in RET with Line Feeds (0x0A).
Then, replace all Carriage Returns (0x0D) in RET with Line Feeds(0x0A).
RET will be a string of raw utf-8 sequences with all newline pairs and single newlines normalized to \n.
Handling of duplicate hnames
In a mailto URI, there can be more than one hname with the same name. When a client parses a mailto URI to generate a TO, CC, BCC, SUBJECT or BODY value, the following rules should be followed.
TO: Join all non-empty hvalues with "%2C%20".
CC: Join all non-empty hvalues with "%2C%20".
BCC: Join all non-empty hvalues with "%2C%20".
SUBJECT: Use only the last subject hvalue even if it's empty and even if a previous one is not empty.
BODY: Join the first non-empty body hvalue and all body hvalues (even if they're empty) after that with "%0D%0A".
If the client supports other address-based hnames, the TO, CC and BCC rules can be used if desired.
Duplicate hnames are allowed for use with clients that have hvalue length restrictions, but still want to support acceptence of large amounts of data for an hname. Even clients that don't have length restrictions should support duplicate hnames for compatibility.
Pre-Parsing
Let S be the incoming string.
Let P be an empty string to store the &-separated list of hname=hvalue pairs.
If S starts with a case-insensitive "mailto:":
Let SUB be the substring of S starting at and including the character at position 4 to the end of S.
Replace the first ':' in SUB with '='.
If one or more '?' are present in SUB:
Replace only the first '?' with '&'.
Assign SUB to P.
Else:
Fail.
Parsing
Let P be the string with the &-separated list of hname=hvalue pairs.
Split P by '&' into array HLIST.
Let I be the current position in HLIST starting at 0.
Let TO, SUBJECT, BODY, CC and BCC be empty strings to store the encoded values.
While I is less than the length of HLIST:
Let S be the string at position I in HLIST.
Let EQ be the position of the first '=' in S.
If a '=' is found in S:
Let HNAME be a string representing the lowercase version of a substring of S starting at and including the character at position 0 on up to, but not including the character at position EQ.
Let HVALUE be a string representing a substring of S starting at and including the character at position EQ + 1 to the end of S.
If HNAME equals "to":
If HVALUE is not empty:
If TO is not empty:
Append "%2C%20" to TO.
Append HVALUE to TO.
Else if HNAME equals "cc":
If HVALUE is not empty:
If CC is not empty:
Append "%2C%20" to CC.
Append HVALUE to CC.
Else if HNAME equals "bcc":
If HVALUE is not empty:
If BCC is not empty:
Append "%2C%20" to BCC.
Append HVALUE to BCC.
Else if HNAME equals "subject":
Assign HVALUE to SUBJECT.
Else if HNAME equals "body":
If (HVALUE is empty and BODY is empty) equals false:
If BODY is not empty:
Append "%0D%0A" to BODY.
Append HVALUE to BODY.
Increment I by 1.
Let DTO, DSUBJECT, DBODY, DCC and DBCC be strings of raw utf-8 sequences representing the decoded versions of TO, SUBJECT, BODY, CC and BCC respectively.
Optionally, reencode from DTO, DSUBJECT, DBODY, DCC and DBCC to generate normalized TO, SUBJECT, BODY, CC and BCC values where:
Characters that were encoded, but don't need to be, are not encoded.
Characters that were not encoded, but need to be, are encoded.
There are no stray %0D or %0A and only %0D%0A.
The decoded values are what mail clients use (after converting them to the needed encoding) to fill in a compose form's text fields.
Click on the link. In your mail client's compose form, you should get the following results.
To:
Cc:
Bcc:
Subject:
Note that there may be extra lines in the body in your mail client if it has signature support.
Also note that for the TO, CC and BCC fields, user-input in most clients requires properly-escaped addresses where certain charactes are escaped with "\". However, some clients allow unescaped user-input and will, after decoding, deescape characters escaped with "\" when filling in the fields. As long as the intended headers are produced in the outgoing message, the client can use whatever user-input method that's desired.
Implementations of parsing, decoding and encoding
In MailtoURIParserPack.zip are C++, D, Java, Javascript, Perl, Python and Ruby MailtoURIParser classes that use the rules in this document.