Many web servers supply incorrect Content-Type header fields
with their HTTP responses. In order to be compatible with these servers, User
Agents consider the content of HTTP responses as well as the Content-Type
header fields when determining the effective media type of the response. This
document describes an algorithm for determining the effective media type of
HTTP responses that balances security and compatibility considerations.
The HTTP Content-Type header field indicates the media type of an HTTP
response. However, many HTTP servers supply a Content-Type that does not match
the actual contents of the response. Historically, web browsers have tolerated
these servers by examining the content of HTTP responses in addition to the
Content-Type header field to determine the effective media type of the
response.
Without a clear specification of how to "sniff" the media type, each User Agent implementor was forced to reverse engineer the behavior of other User Agents and to develop his or her own algorithm. These divergent algorithms have lead to a lack of interoperability between User Agents and to security issues when the server intends an HTTP response to be interpreted as one media type but some User Agents interpret the responses as another media type.
These security issues are most severe when an "honest" server lets
potentially malicious users upload files and then serves the contents of those
files with a low-privilege media type (such as text/plain or image/jpeg).
(Malicious servers, of course, can specify an arbitrary media type in the
Content-Type header field.) In the absence of media type sniffing, this
user-generated content would not be interpreted as a high-privilege media type,
such as text/html. However, if a User Agent does interpret a low-privilege
media type, such as image/gif, as a high-privilege media type, such as
text/html, then the User Agent has created a privilege escalation vulnerability in
the server. For example, a malicious user might be able to leverage content
sniffing to mount a cross-site script attack by including JavaScript code in
the uploaded file that a User Agent treats as text/html.
This document describes a content sniffing algorithm that carefully balances
the compatibility needs of User Agent implementors with the security
constraints imposed by existing web content. The algorithm has been
constructed with reference to content sniffing algorithms present in popular
User Agents, an extensive database of existing web content, and metrics
collected from implementations deployed to a sizable number of users, as
described in
Whenever possible, User Agents SHOULD NOT employ a content sniffing algorithm. However, if a User Agent does employ a content sniffing algorithm, the User Agent SHOULD use the algorithm in this document because using a different content sniffing algorithm than servers expect causes security problems. For example, if a server believes that the client will treat a contributed file as an image (and thus treat it as benign), but a User Agent believes the content to be HTML (and thus privileged to execute any scripts contained therein), an attacker might be able to steal the user's authentication credentials and mount other cross-site scripting attacks.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119.
Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("MUST", "SHOULD", "MAY", etc.) used in introducing the algorithm.
Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant.
The explicit media type metadata information associated with sequence of octets depends on the protocol that was used to fetch the octets.
For octets received via HTTP, the Content-Type HTTP header field, if
present, indicates the media type. Let the official-type be the media type
indicted by the HTTP Content-Type header field, if present. If the
Content-Type header field is absent or if its value cannot be interpreted as a
media type (e.g. because its value doesn't contain a U+002F SOLIDUS ('/')
character), then there is no official-type. (Such messages are invalid
according to RFC2616.)
If an HTTP response contains multiple Content-Type header
fields, the User Agent MUST use the textually last Content-Type header
field as the official-type. For example, if the last
Content-Type header field contains the value "foo", then there is no
official media type because "foo" cannot be interpreted as a media
type (even if the HTTP response contains another Content-Type header
field that could be interpreted as a media type).
For octets fetched from the file system, User Agents should use platform-specific conventions (e.g., operating system file extension/type mappings) to determine the official-type.
It is essential that file extensions are not used for determining the media type for octets fetched over HTTP because, in some cases, file extensions can be supplied by malicious parties. For example, most PHP installations let the attacker append arbitrary path information to URLs (e.g., http://example.com/foo.php/bar.html) and thereby determine the file extension.
For octets fetched over some other protocols, e.g. FTP [RFC959], there is no type information.
Comparisons between media types, as defined by MIME specifications, are done in an ASCII case-insensitive manner. [RFC2046]
The User Agent MUST use the following algorithm to determine the sniffed-type of a sequence of octets:
Content-Type
header field and the value of the last such header field has octets that
exactly match the octets contained in one of the following
lines:
| Bytes in Hexadecimal | Textual Representation |
|---|---|
| 74 65 78 74 2f 70 6c 61 69 6e | text/plain |
|
74 65 78 74 2f 70 6c 61 69 6e 3b 20 63 68 61 72 73 65 74 3d 49 53 4f 2d 38 38 35 39 2d 31 |
text/plain; charset=ISO-8859-1 |
|
74 65 78 74 2f 70 6c 61 69 6e 3b 20 63 68 61 72 73 65 74 3d 69 73 6f 2d 38 38 35 39 2d 31 |
text/plain; charset=iso-8859-1 |
|
74 65 78 74 2f 70 6c 61 69 6e 3b 20 63 68 61 72 73 65 74 3d 55 54 46 2d 38 |
text/plain; charset=UTF-8 |
The algorithm detects these exact octet sequences because some older installations of Apache contain a bug that causes them to supply one of these Content-Type headers when serving files with unrecognized file extensions.
This section defines the rules for distinguishing if a resource is text or binary.
Waiting for 512 octets octets to arrive causes the text-or-binary algorithm to be deterministic for a given sequence of octets. However, in some cases, the User Agent might need to wait an arbitrary length of time for these octets to arrive. User Agents SHOULD wait for 512 octets to arrive, when feasible.
| Bytes in Hexadecimal | Description |
|---|---|
| FE FF | UTF-16BE BOM |
| FF FE | UTF-16LE BOM |
| EF BB BF | UTF-8 BOM |
| Binary Data Byte Ranges |
|---|
| 0x00–0x08 |
| 0x0B |
| 0x0E–0x1A |
| 0x1C–0x1F |
It is critical that this step not ever return a scriptable type (e.g., text/html), because otherwise that would allow a privilege escalation attack.
"WS" means "whitespace", and allows insignificant whitespace to be skipped when sniffing for a type signature.
"_>" means "space-or-bracket", and allows HTML tag names to terminate with either a space or a greater than sign.
The table used by the above algorithm is:
| Mask in Hex | Pattern in Hex | Sniffed Type | Security | Comment | |
|---|---|---|---|---|---|
| FF FF FF DF DF DF DF DF DF DF FF DF DF DF DF FF | WS 3C 21 44 4F 43 54 59 50 45 20 48 54 4D 4C _> | text/html | Scriptable | <!DOCTYPE HTML | |
| FF FF DF DF DF DF FF | WS 3C 48 54 4D 4C _> | text/html | Scriptable | <HTML | |
| FF FF DF DF DF DF FF | WS 3C 48 45 41 44 _> | text/html | Scriptable | <HEAD | |
| FF FF DF DF DF DF DF DF FF | WS 3C 53 43 52 49 50 54 _> | text/html | Scriptable | <SCRIPT | |
| FF FF DF DF DF DF DF DF FF | WS 3C 49 46 52 41 4d 45 _> | text/html | Scriptable | <IFRAME | |
| FF FF DF FF FF | WS 3C 48 31 _> | text/html | Scriptable | <H1 | |
| FF FF DF DF DF FF | WS 3C 44 49 56 _> | text/html | Scriptable | <DIV | |
| FF FF DF DF DF DF FF | WS 3C 46 4f 4e 54 _> | text/html | Scriptable | <FONT | |
| FF FF DF DF DF DF DF FF | WS 3C 54 41 42 4c 45 _> | text/html | Scriptable | <TABLE | |
| FF FF DF FF | WS 3C 41 _> | text/html | Scriptable | <A | |
| FF FF DF DF DF DF DF FF | WS 3C 53 54 59 4c 45 _> | text/html | Scriptable | <STYLE | |
| FF FF DF DF DF DF DF FF | WS 3C 54 49 54 4c 45 _> | text/html | Scriptable | <TITLE | |
| FF FF DF FF | WS 3C 42 _> | text/html | Scriptable | <B | |
| FF FF DF DF DF DF FF | WS 3C 42 4f 44 59 _> | text/html | Scriptable | <BODY | |
| FF FF DF DF FF | WS 3C 42 52 _> | text/html | Scriptable | <BR | |
| FF FF DF FF | WS 3C 50 _> | text/html | Scriptable | <P | |
| FF FF FF FF FF FF | WS 3C 21 2d 2d _> | text/html | Scriptable | <!-- | |
| FF FF FF FF FF FF | WS 3C 3f 78 6d 6c | text/xml | Scriptable | <?xml (Note the case sensitivity [mask = FF instead of DF] and lack of trailing _>) | |
| FF FF FF FF FF | 25 50 44 46 2D | application/pdf | Scriptable | The string "%PDF-", the PDF signature. | |
| FF FF FF FF FF FF FF FF FF FF FF | 25 21 50 53 2D 41 64 6F 62 65 2D | application/postscript | Safe | The string "%!PS-Adobe-", the PostScript signature. | |
| FF FF 00 00 | FE FF 00 00 | text/plain | n/a | UTF-16BE BOM | |
| FF FF 00 00 | FF FE 00 00 | text/plain | n/a | UTF-16LE BOM | |
| FF FF FF 00 | EF BB BF 00 | text/plain | n/a | UTF-8 BOM | |
| FF FF FF FF FF FF | 47 49 46 38 37 61 | image/gif | Safe | The string "GIF87a", a GIF signature. | |
| FF FF FF FF FF FF | 47 49 46 38 39 61 | image/gif | Safe | The string "GIF89a", a GIF signature. | |
| FF FF FF FF FF FF FF FF | 89 50 4E 47 0D 0A 1A 0A | image/png | Safe | The PNG signature. | |
| FF FF FF | FF D8 FF | image/jpeg | Safe | A JPEG SOI marker followed by an octet of another marker. | |
| FF FF | 42 4D | image/bmp | Safe | The string "BM", a BMP signature. | |
| FF FF FF FF 00 00 00 00 FF FF FF FF FF FF | 52 49 46 46 00 00 00 00 57 45 42 50 56 50 | image/webp | Safe | "RIFF" followed by four bytes, followed by "WEBPVP". | |
| FF FF FF FF | 00 00 01 00 | image/vnd.microsoft.icon | Safe | A Windows Icon signature. | |
| FF FF FF FF FF | 4F 67 67 53 00 | application/ogg | Safe | An Ogg audio or video signature. | |
| FF FF FF FF 00 00 00 00 FF FF FF FF | 52 49 46 46 00 00 00 00 57 41 56 45 | audio/wave | Safe | "RIFF" followed by four bytes, followed by "WAVE". | |
| FF FF FF FF | 1A 45 DF A3 | video/webm | Safe | The WebM signature [TODO: Use more octets?] | |
| FF FF FF FF FF FF FF | 52 61 72 20 1A 07 00 | application/x-rar-compressed | Safe | A RAR archive. | |
| FF FF FF FF | 50 4B 03 04 | application/zip | Safe | A ZIP archive. | |
| FF FF FF | 1F 8B 08 | application/x-gzip | Safe | A GZIP archive. |
MP3 audio.
User Agents MAY support additional types if necessary, by implicitly adding to the above table. However, User Agents SHOULD NOT not use any other patterns for types already mentioned in the table above because this could then be used for privilege escalation (where, e.g., a server uses the above table to determine that content is not HTML and thus safe from cross-site scripting attacks, but then a User Agent detects it as HTML anyway and allows script to execute). In extending this table, User Agents SHOULD NOT introduce any privilege escalation vulnerabilities.
The column marked "security" is used by the algorithm in the "text or binary" section, to avoid sniffing text/plain content as a type that can be used for a privilege escalation attack.
This section defines whether a sequence of n octets match the signature for MP4.
box-size/4 - 1 (inclusive):
4*i through 4*i + 2 (inclusive,
zero-based) of the sequence are the sequence 0x6D 0x70 0x34 (the ASCII string
"mp4"), then return that the sequence does match the
signature for MP4 and abort these steps.
This section defines the rules for sniffing images specifically.
This section defines the rules for sniffing videos specifically.
This section defines the rules for sniffing fonts specifically.
TODO
Otherwise, let the sniffed-type be the official-type and abort these steps.
User Agents are allowed, by the first step of this algorithm, to wait until the first 512 octets have arrived.
| Bytes in Hexadecimal | Requirement | Comment |
|---|---|---|
| 72 73 73 | Let the sniffed-type be "application/rss+xml" and abort these steps. | rss |
| 66 65 65 64 | Let the sniffed-type be "application/atom+xml" and abort these steps. | feed |
| 72 64 66 3A 52 44 46 | Continue to the next step in this algorithm. | rdf:RDF |
If none of the octet sequences above match the octets in s starting at pos, then let the sniffed-type be "text/html" and abort these steps.
For efficiency reasons, implementations might wish to implement this algorithm and the algorithm for detecting the character encoding of HTML documents in parallel.
TODO
Thanks to Alfred HÎnes Boris Zbarsky, David Singer, Mark Pilgrim, and Russ Cox.