Lachy’s Log

The Content-Language Pragma Directive

2013-09-07T16:35:48Z

This rationale is written in defence of a technically sound and reasoned approach to dealing with the Content-Language pragma directive issue within the HTML Working Group. ISSUE-88 is a request for permitting multiple language tags to be used as the value of the Content-Language pragma directive. This article argues that this change proposal is unsupported by logic or reason, and resolving in its favour will have an overall negative effect for both authors and implementers.

Summary

This summary is presented as an overview of the arguments presented throughout this article. The supporting rationale in favour of these arguments is presented later.

The change proposal is based upon the false premise that the Content-Language HTTP header and pragma directive are equivalent.
The HTTP header is used to declare the languages of the intended audience; the only defined function of the pragma directive is to be used as a fallback language in the absence of the lang attribute.
The use of the pragma directive as part of server configuration is out of scope of HTML. Specific server side implementation choices need not affect the conformance definition.
The pragma directive only fulfils its purpose of providing a fallback language when one language tag is specified. Multiple language tags are, by definition of the implementation requirements, not useful or beneficial.
There are no reasons given for why it is beneficial to leave the pragma directive in the document when the lang attribute is present on the root element.
Failing to offer a warning about its presence in all cases would continue to mislead the author about its legitimacy.
The inconsistency of when warnings are issued would be confusing to authors. It is better to offer a consistent warning about the presence of a redundant feature.
The defined effect, per the implementation requirements, of declaring multiple language tags is identical to that of omitting the pragma directive entirely. No reasons are given to explain why declaring multiple language tags is useful.
The syntax of the Content-Language HTTP header field is not affected by the definition of the distinct Content-Language pragma directive in HTML, with which it only shares a common name and does not share significant functionality. It is reasonable for this distinct feature to use a distinct conforming syntax that is suitable for its purpose.
No reason is given explaining why only emitting the warning under specific circumstances, as opposed to the current specification requirement, would serve better in encouraging authors to use the lang attribute instead.
The proposed replacement specification text contains unjustified changes, inconsistencies, unimplementable requirements and is overall inappropriate for use in the specification.
The claimed positive ~~benefits~~ effects are unsupported by evidence and, in several cases, blatantly incorrect.
In practice, very few authors use multiple language tags in the pragma directive, and doing so is not useful. Restricting the syntax to one language would not have a significant negative impact.

Difference Between `Content-Language` HTTP Header and Pragma Directive

This premise of the change proposal is that the Content-Language HTTP header field is functionally equivalent to the Content-Language pragma directive using the meta element. This premise is used to support the idea that that both should share the same syntax and client side processing requirements. However, this premise is demonstrably wrong, and thus the change proposal is unsupported by evidence and must be rejected.

In order to demonstrate the differences between the HTTP header and the pragma directive, it is necessary to analyse the purpose and functionality of each and see how they compare.

Declaring the Language of the Intended Audience

The HTTP Content-Language header field is used by HTTP servers to announce the language of the intended audience for a given resource representation. This and other related information exchanged between the client and server can be used for content negotiation based on language. When the server does this, it is important for this information to be included in the HTTP header where it can be seen by both the client and other intermediary servers.

The information declared within the document using the pragma directive is unsuitable for this purpose, as it will not be parsed by intermediary servers that would otherwise utilise the information for caching purposes.

Server Configuration

It has been claimed that the information declared using a pragma directive within the document may be parsed by some server implementations, which subsequently process and echo the value in the Content-Language HTTP header field. Since this header field is allowed to contain multiple language values, it is claimed that this ability is limited by permitting only one language in the pragma directive. However, no evidence has been presented to demonstrate how widely used this feature is, nor why such a feature should even be defined within HTML.

This is a layering violation because information intended for server side processing, and specific implementation details thereof, should not unnecessarily affect the conformance definition of client side HTML. That is, it is out of scope for HTML, as a client side markup language, to define specific processing requirements or features to be used by servers for implementing HTTP features. There is also no inherent need for interoperability between different back end implementation details.

Defining the pragma directive in a way that is optimised for specific server implementation details would be analogous to, for example, defining an ASP specific feature within HTML for use on Microsoft IIS platforms. While server implementations are otherwise free to make any design decision, those design decisions need not affect HTML conformance requirements.

Default Document Language

In practice, Content-Language used within the meta element in the HTML serves as client side metadata. The functionality of Content-Language in this case is restricted entirely to the purpose of specifying a fallback language, to be used in the absence of the lang attribute. This purpose differs significantly from the purpose of declaring the languages of the intended audience.

Declaring multiple languages for the document’s intended audience makes sense in some cases. However, there can only be one default language. Thus, for this purpose, the functionality as defined requires that only a single language value be specified. While the HTTP Content-Language header field is also used for determining the fallback language in cases where it only has a single language value, that is not its primary purpose and is thus not a significant similarity between these two independent features.

Permitting multiple language values to be specified in the pragma directive is at odds with its implementation requirements. Thus, for the client-side metadata functionality of the pragma directive, it is not at all useful to have multiple languages specified, and so it does not make sense for multiple languages to be considered conforming.

These 3 aspects of the functionality — declaring the language of the intended audience, server side configuration and default document language — clearly illustrate that the premise of this change proposal — the shared functionality between the two features — is fundamentally flawed. The reality is that the in-document Content-Language pragma directive only shares its name with the HTTP header field, while its functionality is closer to that of the lang attribute. And since server side implementation details are out of scope of HTML, there is no need for the document conformance definition to permit multiple language values. The solution chosen for addressing this issue must take this into account, and thus reject this change proposal.

Arguments Against the Rationale

The rationale for this change proposal states:

[The current specification] offers no carrot for doing the right thing. while the fallback language effect stops as soon as the author adds lang on the root element, the spec requires conformance checker to continue whining until the http-equiv="Content-Language" meta element has been removed.

The rationale fails to explain the benefit gained by leaving the pragma directive in the document when a lang attribute has been specified on the root element. While leaving it in the document under those circumstances is mostly harmless, it is redundant metadata that the author does not need to include in their document. Failing to offer a warning would continue to mislead the author into thinking that the pragma directive is both acceptable and useful, which it is not.

That it prevents authors from legally using multiple values to replicate the language fallback effect of doing the same thing in a HTTP header — whether they want to replicate the effect of multiple tags or a single tag.

The language fallback effect from using multiple language tags within the value is that there is no default language. This is exactly the same effect as would be achieved by omitting the pragma directive, and so the given reason is blatantly wrong.

i.e. The effect of including a value with multiple languages, like the following:

is identical to that of omitting this pragma directive entirely. This rationale also fails to provide a reason for wanting to replicate this effect by copying the same syntax.

That it underlines the confusion that may exist today, about the nature of lang versus Content-Language, by requiring:

different syntax rules for features that are expected to be identical (HTTP and http-equiv)

similar syntax rules for features that are different (http-equiv and lang)

a warning message which asks authors to “use lang instead” – as if they were juxtaposable alternatives.

In actual fact, the confusion surrounding this issue is the idea that the HTTP header and pragma directive are equivalent, as clearly illustrated by this misguided change proposal. They are different. The HTTP header is used for declaring the languages of the intended audience, the pragma directive is used for specifying a default language.

The lang attribute, on the other hand, is an alternative to the pragma directive when a single language is specified. When multiple languages are specified, there is absolutely no defined effect, and so it serves no valid purpose at all. Therefore, the pragma directive is much closer in functionality to the lang attribute, than it is to the HTTP header, with which it shares its name.

Instead of the above, this change proposal propose:

the Zero-edit proposal’s warning about using lang instead of Content-Language should be changed into a warning which informs that a fallback language measure has kicked in, and recommend that authors create a language declaration (via lang) rather than relying on the fallback feature. This warning should be shown regardless of whether the fallback comes from http-equiv or from the higher level (HTTP). Justification: Since it is a fallback feature, and with other semantics, there is no guarantee that the author has used it for the language effect.

From the authors perspective, the inconsistency of issuing the warning about the use of the pragma directive only when the lang attribute is absent would be confusing. The better alternative is to issue a consistent warning (or error) that simply says to remove the pragma directive and use lang instead.

to hold the syntax rules of HTTP (which permits multiple language tags) as the conforming ones (rather than those of lang, which forbids multiple languages), will have the effect of underlining that lang and Content-Language have different purposes. For instance, since the fallback algorithm doesn’t kick in whenever multiple languages are used in the pragma or on the server, there would not be any warning in these cases.

The syntax requirements for the HTTP Content-Type header are not affected by the HTML implementation requirements. Since the lang attribute on the root element and the Content-Language pragma directive with a single language value do have the same effect, which differs significantly from the purpose of the HTTP Content-Language header, and because it is misleading to pretend otherwise, the syntax of the former does not need to match the syntax of the latter.

a carrot: what we want from authors is that they rely on lang (and xml:lang) for specifying the language — when the author does that, he/she should get immediate reward in the form of removal of conformance warning.

This rationale fails to explain why that same effect of encouraging authors to use the lang attribute would not be achieved by a more consistent warning that states to use the lang attribute and remove the pragma directive. There is no benefit gained by leaving the directive in; and merely silencing the validator by inserting a lang attribute does little to discourage the use of the redundant and totally unnecessary pragma directive.

Arguments Against the Proposal Details

The change proposal suggests replacing the terminology for “pragma-set default language” with “pragma-set locale language”. None of the given rationale explains the need for this change in terminology.

The proposed specification text states:

This pragma contains a Content-Language list, whose semantics and syntax is defined in the HTTP spec.

The semantics of the Content-Language header field as defined in RFC 2616 states:

The Content-Language entity-header field describes the natural language(s) of the intended audience for the enclosed entity. Note that this might not be equivalent to all the languages used within the entity-body.

This semantic definition does not match the actual purpose of the Content-Language pragma directive, for specifying a “pragma-set locale language”. Therefore, referring to RFC 2616 for this semantic definition is inappropriate. The syntax requirements from RFC 2616 are also inappropriate, as it defines the following ABNF, which is not directly compatible with the syntax of the meta element with http-equiv and content attributes.

Content-Language = "Content-Language" ":" 1#language-tag
language-tag = primary-tag *( "-" subtag )
primary-tag = 1*8ALPHA
subtag = 1*8ALPHA

For these syntax requirements to be applicable at all, the specification would have to state that the value of the content attribute must match the ABNF production for language-tag. However, see below regarding the syntax defined in BCP 47.

An HTML5 parser processes this list into a known or unknown pragma-set locale language… The Content-Language list may also be defined in a HTTP header, and will then result in a known or unknown HTTP header-set locale language.

The proposed text fails to define what “known or unknown” means in that context. It is not clear how the implementation determines whether a value is known or unknown. The phrasing of the requirement seems to indicate that it would depend upon the result of parsing the value, rather than just the presence or absence or absence of said value. But the parsing requirements do not use such terminology, and so there is no way to determine whether a given value qualifies as known or unknown.

The parsing requirements for the value of this pragma directive are not specified by the change proposal. However, the change proposal also does not state that the existing parsing requirements in the specification are to be removed, replaced or modified in any way. Thus, by adopting the details of this change proposal, the specification would be left in an inconsistent state which says that multiple language values are supported, but where the parsing requirements abort when more than one value is used.

The aforementioned parsing requirements only focus on parsing the value of the pragma directive, and as such, there is no implementation requirement that sets the “HTTP header-set locale language”.

When a document is lacking a language declaration in the form of the lang or xml:lang attribute on the root element, the document’s locale language (pragma-set or HTTP-set) is consulted by the user agent and used as fallback value for the primary document language.

Assuming the value of the “HTTP header-set locale language” comes from the HTTP Content-Language header, this proposed text fails to specify the order of precedence of the values specified in the pragma directive or the HTTP header.

The use of the term “locale language” in this context clashes with the existing use of the term in the specification to refer to the language set by the user in the user agent’s preferences. This term is used in the table within step 7 of the algorithm to determine the character encoding.

The proposed text then goes on to state:

The following info about the HTTP semantics and Content-Language usage, is informative:

However, in the non-normative list given following that statement, RFC 2119 terminology is incorrectly used to describe what appear to be authoring requirements. In particular:

… authors should not define the Content-Language list according to its parser effect, but according to it semantics.

This non-normative example text also incorrectly states that “en-US” would not be parsed into a useful value. However, this value complies with the syntax requirements specified in RFC 2616, BCP 47 and also with the existing parsing requirements in the HTML5 specification.

The proposal states that the following requirement is to be removed:

Conformance checkers will include a warning if this pragma is used. Authors are encouraged to use the lang attribute instead.

The rationale provided does not adequately justify the removal of this warning, and nor does it adequately justify replacing it with a more limited warning to be issued only when the pragma directive is in the absence of the lang attribute.

The proposal then states to amend this requirement as follows:

the content attribute must have a value consisting of a valid BCP 47 language tag, or a comma separated list of two or more BCP 47 language tags.

However, the proposal stated earlier that the syntax for the value was defined by RFC 2616. This requirement now conflicts with that by stating that the syntax of the content attribute’s value is defined by BCP 47. This inconsistency negatively affects the quality of the specification.

The proposal states that this note is to be removed:

This pragma is not exactly equivalent to the HTTP Content-Language header, for instance it only supports one language.

The removal of this note would be misleading, because the note itself is factually correct as-is with the current specification, and with the details of this proposal, which, as stated above, leave the parsing requirements unchanged. The proposal fails to include any implementation requirements that actually permit multiple language tags to be used.

It has now been clearly demonstrated that the proposed specification text provided by this change proposal is thoroughly inadequate for its intended purpose. If the specification were to be amended as required by this change proposal, the inconsistency and lack of clarity would negatively affect the ability to read, understand and implement this specification. As such, this proposal should also be rejected on the basis that its proposal details are inadequate. However, if this working group does make the wrong decision to permit multiple language tags, then I ask that the editor be given full editorial discretion to phrase the requirements in a way that more clearly expresses the requirements, rather than being asked to accept the details of this proposal as written.

Arguments Against the Claimed Positive and Negative Effects

More positive: authors can get rid of the warning by adding something — — this is better than a focus on removal of the (over all) harmless Content-Language meta element.

Likewise, authors can get rid of the warning as required by the current specification by removing the meta element. No rationale is provided to explain why the act of removing the pragma directive is significantly more difficult than adding the lang attribute to the root element. Depending on the authoring tool or CMS, both of these actions are likely to be just as easy or just as difficult to perform. This purported benefit is thus unsubstantiated and invalid.

More stable: same syntax as before continue to be permitted.

As documented by the null change proposal, observation of the use of this pragma directive shows that only a very small minority of authors use multiple language values. However, the claimed benefit of continuing to use this syntax is nullified by the fact that, due to the implementation requirements, multiple language values are not at all useful.

More permissive: authors, CMS-es and browsers can continue to take advantage of HTTP-EQUIV’s ability to reference what the HTTP header is/was supposed to be, including replicating its fallback effect.

No rationale is provided to explain why that ability is in any way beneficial.

More correct: the difference between lang and Content-Language is pointed out, while the link between http-equiv and HTTP is emphasized.

As has been demonstrated, this is blatantly wrong. The lang attribute and the Content-Language pragma directive share more in common in terms of functionality, than to the pragma directive and the Content-Language HTTP header field.

More useful: a warning that a fallback feature has kicked in, is more useful than a warning which focuses on one of the places where the fallback language could potentially kick in from. Why tell the author to “please use lang instead” if the author has already made sure that the lang attribute is in place?

It seems more useful for authors to be informed about the presence of a redundant and useless feature, than to have them continue to mistakenly believe that the pragma directive is in any way useful. However, either way, both of these are highly subjective claims about what may or may not be useful to authors, which cannot be objectively evaluated without supporting data.

Has positive side effect: Encouragement to place a lang attribute on the starttag of the html element will lead authors to actually type in the html root element, instead of relying on the parser to generate it for them.

Relative to the status quo, the zero edit change proposal, or the proposal to make Content-Language non-conforming, the above is not a unique benefit. Both this and the other change proposals require validators to notify the author about the issue and encourage the use of the lang attribute.

More accurate because it does not conceal the problems by introducing an artificial technical and semantic difference between Content-Language from the HTTP header and Content-Language inside the http-equiv meta element.

This accuracy claim is undeniably wrong, given that the significant differences between the HTTP header and pragma directive have already been explained.

Conclusion

Based on the arguments presented in this article, it is clear that the change proposal arguing for multiple language tags to be permitted is misguided, and lacks any significant or valid supporting arguments. The overall effect of of the group accepting this change proposal would have a serious negative impact upon the quality of the specification. It is therefore my strongly reasoned opinion that the HTMLWG must reject this change proposal either in favour of the status quo, or in favour of making Content-Language entirely non-conforming.

Introducing WebM

2013-09-07T16:36:00Z

Today, Google, in co-operation witt Opera, Mozilla, CoreCodec (Matroska developers) and a range of other companies, have announced at Google I/O 2010 that WebM is the new royalty free video codec for the web.

Earlier this year, Google purchased On2, the company that developed of a range of video codecs including VP3, VP6, VP7 and VP8. VP3 is a well known codec that formed the basis of Theora. VP6 is a codec supported by Adobe Flash, VP7 is used by Skype for video conferencing. Their latest offering, VP8, now forms the basis of the new WebM video format. The code for the VP8 codec has been released royalty free under the BSD licence.

WebM, which stands for Web Media, is a format based on 3 technologies:

Container: A variation of Matroska called WebM.
Video codec: VP8.
Audio codec: Vorbis.

The Container Format

Matroska is a widely supported container format, which is able to contain a wide range of codecs, including, among others, h.264, VC-1, Theora, AAC, AC3 and Vorbis. This is due to the high degree of flexibility inherent in the design of Matroska.

Matroska itself if based on a binary markup language called EBML, the design of which was inspired by XML. In short, EBML files contain a header that declares the DocType and version information, followed by a tree of elements and data, marked up using a special binary notation. The Matroska specification defines a range of elements, and their binary notation, that can be used for marking up the data in Matroska files.

The WebM format is a subset of Matroska, which has been optimised for streaming over HTTP.

WebM, which uses the DocType “webm”, can be distinguished from Matroska, which uses the DocType “matroska”. Technically speaking, a valid WebM version 1 file supports a subset of elements from Matroska version 1, and WebM version 2 supports those in addition to some of the additional elements from Matroska version 2.

To further optimise WebM for use on the WebM, some additional formatting guidelines are imposed upon WebM files, over and above the Matroska counterpart. These guidelines include plaicing the indexing information at the beginning of the file, and keyframes stored at the beginning of clusters.

The WebM container is only permitted to contain the codecs VP8 and Vorbis, and browsers will not support any other codecs within WebM – not even Theora or h.264. Although there are no technical limitations with WebM that inherently prevent such codecs from being used, this was an intentional decision to improve the usability of WebM.

The idea being that if you have a player that supports WebM, you can be more confiden that the file will play without having to install additional codecs. This is a problem that has plagued container formats like AVI for years. You can’t easily determine what it contains until you start playing it. Some AVI files may contain DivX, Xvid, h.264 or a wide range of other codecs.

Benefits of Matroska

Matroska presented some nice benefits over competing container formats, sucha s MP4, commonly used with h.264, or even Ogg, which is supported by Opera, Firefox and Chrome for Theora and Vorbis. Like Ogg, Matroska is publicly specified and available to use freely, unlike, for example, MP4.

The main benefit of Matroska over Ogg is that the seeking information can be placed at the beginning, making it significantly easier to seek in a WebM file being transferred over HTTP. When the user tries to seek, if that part of the video hasn’t yet downloaded, then the browser needs to request that section from the server.

For Ogg, browsers have to do at least 2 separate requests when a video loads — one to get the beginning of the file and a range request to get the end — before the length of the video can be determined, and before seeking can occur, which then potentially results in additional requests.

For WebM, all the information is presented up front, meaning that if a user seeks the video, the browser knows exactly where in the video to go, or which part of the file to request from the server.

This is not to say that Ogg itself is a bad format. Quite the contrary, it’s just optimised for different use cases. Ogg is very good to use as a streaming container format where seeking is not required, or for storing your Vorbis encoded music collection locally, where the player isn’t subject to the overhead of HTTP requests.

WebM, on the otherhand, had to be specifically designed for use with the HTML video element served over HTTP, and as such, benefited from the design decisions of Matroska.

Audio and Video Codecs

The VP8 codec provides significant quality enhancements over its predecessors; most notably Theora. Comparisons between Theora and h.264 have shown that the quality of Theora is not up to scratch. Thanks to Google, VP8 has now been released freely.

There haven’t yet been any serious, independent comparisons between h.264 and VP8, so it’s difficult to say which is better. Although h.264 is certainly more mature than VP8, and has a lot more hardware support in existing devices, VP8 is likely to continually improve over the coming years.

The main limitation with VP8 at the moment is the lack of hardware acceleration. Firefox, Opera and Chrome all currently use software decoding of VP8, which means that it can increase CPU usage, particularly for high definition videos, and watching a lot of video will drain your battery more than hardware decoded h.264.

However, Google have announced that they are working with hardware partners, and its possible that we’ll see devices shipping with support within a year or two.

Vorbis, of course, has been supported by Firefox, Opera and Chrome for a while already, and so it was a natural choice to use in combination with VP8 in WebM.

YouTube

Over the past few weeks, YouTube has been working to convert many existing videos into WebM. To try this out using a browser that supports WebM, follow the instructions provided by the WebM Project. While not all videos have been re-encoded yet, thousands of videos are already available in WebM format, and will work in Opera, Firefox and Chrome.

Demo Time

Just so you can see for yourself what VP8 looks like, get yourself a copy of the preview releases of Opera, Firefox and Chrome, sit back, relax and watch Elephant’s Dream from the Orange Open Movie Project (website). I encoded this myself from the lossless source files using a special build of ffmpeg with libvpx_vp8 (the VP8 codec library).

Creating Your Own Videos

The absolute easiest way to create your own WebM video is to upload your source video to YouTube and wait for it to be encoded. Other services, including encoding.com and HD Could also offer transcoding services for a small fee.

If you want to encode the videos yourself, you need to get your hands dirty with a tool like ffmpeg with libvpx_vp8, or a commercial alternative. Google have released the source code for libvpx_vp8, and builds of ffmpeg with it should be available shortly. More information is available on the The WebM Project tools page

The Matroska developers have also been working on on updating their Matroska muxing software to support the WebM profile. New tools called mkvalidator and mkclean will help you to validate your WebM files, and to clean and remux files that aren’t valid. mkclean will also remux MKV files containing VP8/Vorbis to WebM.

Browser Support

Preview releases have been released for Opera, Mozilla Firefox and, of course, Google Chrome.

More details are available on WebMProject.org.

HTML 5: The Markup Language

2013-09-07T16:36:58Z

A relatively new editor’s draft entitled HTML 5: The Markup Language has been proposed by Mike Smith. This draft is an attempt to define the vocabulary and syntax of HTML, without any implementation conformance criteria or associated DOM APIs. It’s being positioned as a replacement, normative definition of the language over the existing HTML 5 spec, and its proponents claim that it’s better for authors. But don’t be deceived; this draft isn’t what it claims to be, and is not really beneficial for the vast majority of web developers.

About the Draft

The document itself is largely generated from two primary sources, with some additional explanatory material included manually. It incorporates selected statements and conformance criteria from the spec itself, which is fine. This is a useful technique to help ensure that it and the spec stay relatively in sync with each other. But it also incorporates the RelaxNG schemas and regular expressions that are being developed for the HTML 5 Validator. This is part of the source code from one particular validator implementation, and it’s important to note that this code was not primarily written for human consumption, but rather machine processing.

Yet, despite this, it is being pushed as a suitable, human readable method for describing the conforming syntax and element content models of HTML. In a sense, it’s analogous to the DTDs used within the HTML 4.01 specification, except that it’s more difficult to read.

From past experience, we know that many web developers were not comfortable reading the DTD syntax, and preferred to check reference guides, tutorials, or ask others on mailing lists or forums to explain things. So the notion that such a document would be useful for the majority of web developers is, frankly, absurd.

But don’t just take my word for it. Let’s take a look at some examples of this notation and see for ourselves. This is the regular expression that describes the conforming DOCTYPE syntax:

doctype =

If that’s not scary enough, how about this which defines the conforming values for the target attribute:

To be fair, it is accompanied by a plain text list of examples of the four predefined values, but simply looking at the examples alone doesn’t the reader anything about case insensitivity, nor indicate that other custom values are not allowed to begin with an underscore. The only way to deduce that is from the above RegExp.

Finally, take a look at the definition of the a element, or any other, and see if you can understand what it means. Personally, I know how the a element is defined in the spec, but even I can’t easily figure out what that schemas are trying to say.

The a element’s content model is actually defined as Transparent in the spec, which you can think of as basically meaning that its content model is inherited from the parent element. (This is a slight over simplification of its actual meaning, but we can ignore the subtleties for now.) i.e. When it’s included as a child of an element that only permits phrasing content, that applies to the a element too. But when it’s parent permits flow content, so does the a element. If you were able to decipher that on your own from the proposed draft, then well done. I couldn’t.

By now, you may be asking, if this proposal isn’t really suitable for web developers, then who is it suitable for? It’s a question that has been asked several times on the mailing list, and yet one that has not yet been adequately answered. I’ll do my best to explain how I see it shortly. But first, there’s a little background to cover.

The Spec Splitters

Within the working group, as expected, many people have a very diverse range of opinions. In particular, a number of individuals share the opinion that the current HTML 5 spec is far too monolithic and that it should be split. There’s nothing inherently wrong with that position, per se. There are indeed sections of the spec that nearly everyone agrees should be, or have already been, separated out into their own specifications.

For instance, XMLHttpRequest was, at one time, part of HTML5. This was taken out a long time ago and moved to the WebApps working group, where it has thrived independently from HTML5 ever since. More recently, the web sockets protocol and API have also been split into their own specs, as has the the content sniffing, HTTP Origin header, and more.

The issue is that a number of individuals want the spec split in ways that aren’t entirely sensible. This includes the idea of splitting the spec along the lines of a conforming, declarative language definition and separate implementation requirements. There are even those who would go so far as to say that only the former should be defined, effectively leaving the implementers to fend for themselves. But I’ll spare you from the horror of such extremes, as the group moved beyond that debate long ago, and merely deal with those who want to split the spec.

From high level perspective, the concept of splitting the spec along those lines looks reasonable. These two seemingly independent components intuitively feel like they could be defined separately. That is, until you start to appreciate just how intertwined these sections are, and where exactly they want to draw the line.

It is argued that the language spec should only describe the conforming syntax and content models of the HTML markup alone. This would omit any details about how such features are processed and provide limited information about what they do. It would also omit any and all details about the associated DOM APIs.

The semantics of elements and attributes are closely related to what functionality they provide, which is itself closely related to the implementation requirements. Consider, for example, the heading and sectioning elements. Their semantics are useful for providing hierarchical document structures, with varying levels of headings. This is very closely related to the processing requirements for creating an outline. Authors need to know how to mark up their heading structures, and implementers need to know how to interpret them.

Consider also, many of the DOM APIs for many elements reflect the values of the content attributes. The processing requirements for getting and setting such properties is very dependent upon the processing requirements for the attributes themselves, which is itself dependent upon the conforming values of those attributes.

There are many more examples of such interconnected dependencies, but I won’t try to list them all. Suffice it to say that the problem is that by splitting the spec, it becomes much harder to manage the integration points between these highly interconnected sections, and creates a greater risk of things not being defined well. Such a situation would inevitably lead to interoperability problems, which doesn’t only end up hurting implementers, but everyone involved including authors and users.

The Wedge Strategy

Despite the significant resistance to splitting out the language definition, there has still been a significant push for there to be a document that normatively defines it separately from the implementation requirements, and this draft has been put forth with the intention of doing just that.

However, since the spec has not been split in the way described above, and hopefully won’t be, we are left with a situation where we have two drafts, the HTML5 spec itself and this proposal, each claiming to normatively define the language.

But some people seem to be willing to use this to get their way, even if it means normatively defining the language twice, in two separate specs. This is of course absurd. With two normative documents, each defining things in their own way, will inevitably lead to conflicts between the two specs, which then raises the question of which takes precedence.

While people claim that it’s possible to define things normatively in two separate specs and keep them in sync, there is no evidence to support that situation and plenty of evidence against it. But suffice it to say that it won’t work and will lead to one of two possible outcomes:

The conforming language definition is split from the main spec, leaving it to be defined only in this proposal. This, as I explained above, would be bad.
The proposal becomes non-normative, leaving the spec itself as the single authoritative normative source. This is what I have been and will continue to push for.

The Audience of the Proposal

As I briefly explained above, given the content of the draft, it is not really suitable for the vast majority of web developers. In fact, its audience is, in practice, despite claims to the contrary, severely limited in scope to a small minority of people that are comfortable with reading complicated schemas and regular expressions, and whom actually have some use for them.

Schemas are primarily designed for the purpose of conformance checking. Specifically, tools that read the document and compare it with the grammar described in the schema. This is effectively what validators do, although it should be noted that schemas are not the only means of achieving this goal.

So it is somewhat useful for people writing tools with conformance checking features, since they can, if they choose, incorporate the schemas from the spec into their own tools, or use them as a guide for creating their own. However, it doesn’t provide all the information necessary for such developers, as they will still need to turn to the main spec for many implementation requirements, particularly parsing.

What about Web Developers?

Web developers certainly haven’t been forgotten. Their needs are just as important to address as implementers. But I and many others recognise that such developers, many of whom aren’t comfortable with normative spec language, need something specifically targeted at them. For this, there are now two separate, non-normative drafts, under development.

The first, currently entitled the HTML 5 Reference, really a reference guide for web developers that will explain the elements, attributes and their semantics, the syntax and DOM APIs, and provide plenty of explanatory material and examples showing how and why to use each feature. This is a draft that I’m working on and have recently started to make some significant progress with it.

The second is a new proposal by Dan Connolly, but which there is currently no draft available. This document is intended to be more of a step-by-step, cookbook-style guide to writing pages using HTML5, with a big focus on the multimedia aspects. e.g. It will provide things like:

How to embed a video within a page and provide customised controls using the DOM API,
How to indicate the completion status of a web application using a progress bar.
How to markup images with captions
etc.

Google’s Favicon: Yikes!

2013-09-07T16:37:09Z

Google’s new favicon is horrible!

The final icon

According to the Google blog, the icon was based on a submission from André Resende, which I think looks better than the final version. However, I still think both are absolutely horrible.

André Resende’s original submission

As you can see, Google shifted the ‘g’ from the centre to the far left making it look unbalanced. Overall, I find it rather displeasing to the eye. I guess I will have to find some way to either block the icon entirely or make my browser use a custom icon instead.

Selectors API 2nd Last Call

2013-09-07T16:37:19Z

Selectors API was again published as a Last Call on 14 November. For anyone who hasn’t heard about this before, this is an API designed for selecting elements in the DOM by querying using Selectors, as used in CSS.

This is expected to be the last round before proceeding to Candidate Recommendation around mid-December. If you have any further comments to make, you have until 12 December to send them in, preferably to [email protected] and to ensure I don’t miss it, please include [selectors-api] in the subject.

The implementations of API have progressed nicely, with each of the for major browsers: Firefox, Opera, Safari and IE, expected to include support in their next major release. JavaScript libraries, such as JQuery, are also expected to take advantage of the feature in upcoming releases, which should mean performance improvements for users with updated browsers; although they will continue to fall back to their own script-based implementations in older browsers.

I Hate Religion

2013-09-07T16:37:33Z

The idea of an invisible, omniscient, omnipotent and omnipresent supernatural being, who was supposedly responsible for the creation of the universe, has never really seemed plausible to me. From a young age, barely old enough to actually comprehend the ludicrous notions being taught in scripture, I rejected it and all the mythology that came with it. I couldn’t, and still can’t, understand why people have to resort to explaining the origin of the universe by saying “god did it”, and yet be content with having no explanation of where this so-called “god” came from. It made no sense to me then, and it still makes no sense to me now.

My disdain for religion began as a young schoolboy, of no more than about 7 or 8, possibly younger. It’s difficult to be more specific than that. Thankfully, I was never forced into religion by my parents, neither of whom are overly religious themselves. Mum seems quite indifferent to the whole thing and while Dad still attends church every weekend, he rejects fundamentalist dogma and biblical literalism, like any rational person should.

Luckily, I attended a public school and so I didn’t have it forced down my throat there either. However, students were still sent to scripture for about half an hour a week, for part of the year, separated into groups by denomination. Unfortuantely, the separation of church and state in Australia isn’t quite as clear cut as it’s supposed to be in the USA, and so some religion is still allowed in public schools.

I wasn’t overly happy about that arrangement. I didn’t want to go just to be taught stories I didn’t believe. As far as I know, there was no alternative available for non-believers; or if there was, I don’t know why I wasn’t sent there. So I did what any rebellious kid would do. I acted out in various ways; not always, but frequently enough. Unfortunately, the details of my exploits mostly elude me. It’s hard to remember that far back.

But on one occasion that I do remember, we were given some kind of work sheet to fill out, with various questions about the fables being read to us. I remember repeatedly scrawling phrases like “GOD DOES NOT EXIST!!!” and “JESUS WAS NOT THE SON OF GOD!!!” as my answers. I haven’t a clue what the questions were. I then spent the remaining time filling up the rest of the page with exclamation points as my way of emphasising the fact that I didn’t believe any of that nonsense and really didn’t want to be there. Sadly, I can’t recall the response of the minister when he saw it. I’m sure it wasn’t particularly positive.

So in a sense, I took The Blasphemy Challenge about a decade and half before it was considered cool, and well before I knew it meant my eternal damnation! But hey, now that I do, I have something to look forward to.

That was by no means my only form of protest during school scriptures. Other times were a little more disruptive. But it was around this time that I vowed, when I grew up, I would fight to have all religion abolished from public schools. I still hope that this will happen one day.

Anyway, this may seem odd for a kid as young as I was back then to be so vehemently opposed to religious dogma. I certainly knew of no other in my position—at least none of my friends were—and there was no-one else in my life from whom my blasphemous anti-religious sentiments spawned. But the fact is, I was an atheist long before I even knew what the word meant, let alone knew of anyone else who shared my disbelief. To be honest, I believed in Santa Claus and the Easter Bunny longer than I believed in a god.

Of course, all of this was well before I knew anything about the scientific explanations for the origin of life, the universe and everything. I knew nothing of the Big Bang Theory, Evolution, nor anything else in between. My rejection of religion was not based on scientific knowledge. So then the question arises how and why did I manage to not only avoid, but to actively reject indoctrination, especially at such a young and impressionable age? The answer to this will become apparent later.

But despite these views of mine, I did in fact occasionally attend scripture at church on Sundays. Not because I was ever dragged there against my will, kicking and screaming; but by my own choice. It was a tough choice to make though, and most of the time I chose not to. By this stage, I had already firmly rejected any sort of faith-based beliefs, and there was no chance of me ever converting. So why did I attend? Simple: because my friends did and sometimes the activities were fun, as was running around playing in the church yard afterwards.

I still find it somewhat amusing that over the years, given my anti-religious convictions, two of my closest friends have been deeply religious. One was the son of our church minister, who sometimes taught my scripture class at school. I’ve no-doubt he was on the receiving end of my aforementioned protests, most likely on more than one occasion. Although he actually respected my views and never tried to force his onto me, and we developed a kind of mutual respect for each other. I suppose it helped that his son and I were good friends. But the irony of this was that I spent many afternoons, after school, hanging out a minister’s house — almost the last place you’d expect to find a radical atheist.

Unfortunately they moved to another town during the later years of primary school and I’ve not seen much of them since. After this, I never voluntarily attended scripture again.

But the problem was not only did one of my good friends move away, the replacement minister, who happened to move into that same house I’d spent so many memorable afternoons, was not nearly as pleasant or respectful. I despised him for the way he treated me unfairly from the rest of the students in the class, largely because of our diametrically opposed views. But, I must admit, it was probably partially compounded by my occasional disruptive, rebellious conduct. But let’s just say he started it and he still owes me a Mintie, and leave it that.

The other one of my closest friends considers himself to be a born-again, fundamentalist christian who I’m pretty sure still believes everything I don’t. I’m not entirely certain though, as we stopped discussing religion after our arguments started getting in the way of our friendship.

We’re still friends today, but our arguments largely centred around the non-existence of God and the implausibility of many of the biblical myths that he took so literally, such as Adam and Eve; Noah’s Ark; Moses parting the Red Sea; God dictating the Ten Commandments to him; the “virgin” birth; the many miracles claimed to be performed by Jesus; the resurrection, and many other stories that any rationally thinking person would unquestionably reject as myths and allegories, had they not been indoctrinated into believing from childhood.

When I questioned how he could believe, or know any of it was true, the answer always came down to one thing: faith. Nothing but blind, unwavering, unsupported and utterly irrational faith! I’m sure that comes as no surprise, as it’s a fairly typical requirement of any religious person. But it’s the concept of faith that I was never able to grasp, and this is why I rejected religion so early in my life.

I simply could not accept as true: outlandish stories which could not be verified, depended upon unfounded assumptions or invoked supernatural beings or powers, based on nothing more than faith. Nor could I understand what could possibly make any one religion more true or at least more believable than any other.

I viewed all of these myths as outright lies handed down from one generation to the next, infecting peoples minds with irrational belief in what can only be described as fairy tales. It destroys all sense of reason. It discourages critical thinking by encouraging belief in spite of reason. This anti-intellectualism, I thought, was quite dangerous in and of itself, and it had to be stopped. Though it wasn’t till much later in life that I realised just how dangerous religion can be towards not only science and progress in general, but to civilisation as a whole. I now realise that it absolutely must be stopped.

Until this point, my exposure to religion, specifically christianity, had been largely limited to the toned down versions of the biblical myths aimed at children. I had never actually read the real bible in its entirety, and still haven’t read most of it to this day. But I’ve now read enough of it to see how violent, discriminatory, bigoted, immoral and just plain evil that the characters and events depicted in the bible, including “God” himself, can be. It’s disgusting!

If there’s one place in the western civilisation where the anti-intellectual, anti-scientific nature of religion is most clearly illustrated, it’s in the god-fearing, bible bashing, United States of America. Led by organisations like Answers in Genesis, Creation Science Evangelism and the Institute for Creation Research, among others, the USA’s constitutional separation of church and state has been and is still being attacked and eroded by fundamentalists.

Ranging from getting the phrase “In God We Trust” added to the US currency; the words “Under God” inserted into the pledge of allegiance; hijacking the Boy Scouts of America and turning it into a homophobic christian organisation; right up to the continuing, though thankfully failed, attempts to get Creationism, Intelligent Design, “Teach the Controversy” or whatever you want to call it next, taught in public schools. It certainly doesn’t help that the current US President considers himself to be a born-again christian, nor that the candidates for the upcoming election aren’t any better in this respect.

To an outside observer, the Creation Museum surely seems like an hilarious attempt at mocking the faith and highlighting the absurdities of these Bronze Age myths. Well, it would be if the anti-science organisation responsible for it wasn’t so serious about actually believing such propaganda. In reality, it’s just sad.

I should point out that even though I’ve focussed largely on the myths and lies of Christianity, that’s only because it’s the one religion I’ve had the most exposure too. But rest assured, when it comes to insulting religion, I’m an equal opportunity offender.

For instance, imagine for the moment that I had been born into an Islamic nation and attended an Islamic school, yet still developed my same contempt for religion. I’m quite sure I’d have been stoned to death by now for my blasphemy. Sadly, that particular sin still carries the death penalty in some sick middle eastern countries that are ruled by an Islamic theocracy. Although I’m quite sure there are some christians who would support the same penalty.

Both the Bible and Qur’an are filled with countless examples of brutally stoning people, rape, torture, discrimination against homosexuals, segregation of women and other horrendous acts. It really is a wonder how any Christian or Muslim can honestly claim their faith as the basis of morality. It’s just absurd.

The bottom line is, I have no respect for religion at all. At least, not the theistic religions. I have mildly more respect for non-theistic religions like Buddhism, which I view as really more of a philosophy and way of life, than a religion, though I’m not really familiar with it.

I respect people, and I not only respect, but would defend anyone’s freedom to believe whatever they want, including religion; but I have no for respect religious belief itself.

I hate religion. Fuck Christianity. Fuck Islam. Fuck Judaism and Hinduism. Fuck Scientology and all other crazy cults. Fuck ’em all. Respect people, not religion. The world would be a much better place without it.

about:internets

2013-09-07T16:37:43Z

Many people didn’t believe Senator Ted Stevens when he said that the Internet was a series of tubes. Well now, thanks to Google, there is proof that he was right! In Google Chrome, if you have it, visit about:internets. This provides a graphical illustration of the internet which looks very much like a series of tubes to me.

It appears to based upon the Windows 3D Pipes screensaver. It uses the mixed joints, but I’m not sure if it also includes the teapots, like the real screensaver. If you see one, let me know.

Google Chrome

2013-09-07T16:38:10Z

The rumours have been going around the web for years about the possibility of the Google browser, with some rather wild speculation about what exactly it would be like. John Rhodes seems to be one of the earliest to float the idea of the Google Client in September 2001, and in August 2004, based on Google’s relationship with Mozilla at the time, Kottke predicted a Mozilla-based Google browser.

In February this year, it was reported that Google had assembled a team to work on on a WebKit based browser, then known as GBrowser. Now just over 7 months later, all the rumours and predictions have finally been realised. Google Blogoscoped announced and leaked a comic book entitled Google Chrome earlier today describing many of the innovative features developed for the new browser. Shortly afterwards, the official Google blog admitted that it was mistakenly released a day early.

It should be noted that the concept also includes a few ideas based on features in other browsers, such as Opera’s Speed Dial, and both Firefox and Opera’s address bar (a.k.a. Awesome bar), called omnibox.

The comic was drawn and created by Scott McCloud and has been released under a Creative Commons by-nc-nd 2.5 licence. ~~The comic has currently been taken down due to server load~~ (it’s up again), but I have published a copy of the whole comic here for you to see it, if you haven’t already. You can also download a tarball of all the images.

Internet Explorer 8 Beta 2

2013-09-07T16:38:31Z

A few days ago, the 2nd beta of IE8 was released. Although I haven’t had much time to play with it and find out what it does and doesn’t support, I have come across a few bugs with it.

But one of the big problems we still have with IE is the inability to run several versions side by side. The one solution continually offered by Microsoft is the ability to download virtual machines, which are set to expire after a limited amount of time. The problem with this is that you still need a separate virtual machine for each version of IE you want to run, and after they expire, you need to get a new one.

Anyway, I have a found an even better solution. One that lets me run IE6, IE7, IE8b1 and IE8b2 side by side, all within the same copy of Windows XP, which I also have running in a single virtual machine on Mac OS X. I don’t have time to elaborate on the solution now, but I will try to do so over the next few days. For now, here’s a screenshot showing them all running together, with each demonstrating how badly they fail Acid 3.

Interview about HTML5 on Boagworld

2013-09-07T16:38:22Z

Boagworld is a web design and development podcast based in the UK. In today’s episode, they interview me about HTML5. In it, we discuss the current state of HTML5, some of the new features that are currently, or are being implemented, and what we can expect in the future.