Loading...

atom-syntax@imc.org

[Prev] Thread [Next]  |  [Prev] Date [Next]

Re: Change Proposal to HTML WG to fix the algorithm for generating Atom feeds from HTML content Tim Bray Wed Apr 07 18:01:27 2010

I must say, the notion that you can guarantee that a URI can't be
dereferenced is charmingly naive.

I also agree with Julian that it's a big hairy problem that the
language blesses generating two atom entries with different atom:ids
from different revisions of the same HTML resource.

 -Tim

On Tue, Apr 6, 2010 at 2:17 PM, Julian Reschke <[EMAIL PROTECTED]> wrote:
> FYI:
>
> this relates to an HTML-WG discussion about the algorithm to create Atom
> feeds from HTML (<http://dev.w3.org/html5/spec/Overview.html#atom>).
>
> See <http://www.w3.org/Bugs/Public/show_bug.cgi?id=7806> and
> <http://www.w3.org/html/wg/tracker/issues/86> for more context on how we got
> here.
>
> Best regards, Julian
>
> On 06.04.2010 23:12, Julian Reschke wrote:
>>
>> Hi,
>>
>> below is a change proposal for this issue.
>>
>> Note that an obvious alternative to fixing the algorithm would be to
>> remove the section completely.
>>
>> Best regards,
>>
>> Julian
>>
>> -- snip --
>> SUMMARY
>>
>> The HTML5 spec contains an algorithm for producing an Atom (RFC4287)
>> feed document from an HTML page.
>>
>> The definition both relaxes a MUST-level requirement from RFC4287, but
>> also adds a needless restriction.
>>
>> Also, it's not clear *at all* whether this is a feature that people
>> really want, and if they do, whether it needs to be part of HTML5. Given
>> the fact that it's non-trivial to generate a valid Atom feed from HTML,
>> but the reverse *is* trivial, we should also consider removing this
>> feature altogether (I'd be happy to write a 2nd change proposal if
>> people want to see that as well).
>>
>> RATIONALE
>>
>> Instructions to derive a secondary format from HTML documents shouldn't
>> be misleading, and also should make clear which conditions need to be
>> met to produce valid documents.
>>
>> DETAILS
>>
>> There are two problems, both with the following step (4.15.1, step 15.9
>> as of April 6):
>>
>> "Otherwise
>>
>> Let id be a user-agent-defined undereferenceable yet globally unique
>> valid absolute URL. The same absolute URL should be generated for each
>> run of this algorithm when given the same input. Let has-alternate be
>> false."
>>
>> Problem #1: RFC 4287 does not require the ID to be undereferenceable.
>> This was a conscious decision of the IETF AtomPub WG. There's absolutely
>> no point in adding this requirement, except for the spec author's
>> distaste for URIs that are both dereferenceable *and* act as a globally
>> unique and stable identifier.
>>
>> Note from
>> <http://greenbytes.de/tech/webdav/rfc4287.html#rfc.section.4.2.6.p.2>:
>>
>> "...Though the IRI might use a dereferencable scheme, Atom Processors
>> MUST NOT assume it can be dereferenced."
>>
>> Problem #2: RFC 4287 makes it a MUST-level requirement to generate the
>> same ID every time the feed is regenerated:
>>
>>  From
>> <http://greenbytes.de/tech/webdav/rfc4287.html#rfc.section.4.2.6.p.3>:
>>
>> "When an Atom Document is relocated, migrated, syndicated, republished,
>> exported, or imported, the content of its atom:id element MUST NOT
>> change. Put another way, an atom:id element pertains to all
>> instantiations of a particular Atom entry or feed; revisions retain the
>> same content in their atom:id elements. It is suggested that the atom:id
>> element be stored along with the associated resource."
>>
>> HTML5 relaxes this to a should-level requirement.
>>
>> I do agree that generating valid Atom feeds from HTML *is* hard, but
>> violating a MUST-level requirement from the Atom spec is not acceptable.
>>
>> Proposed changes:
>>
>> For issue #1:
>>
>> Leave out "undereferencable", changing the sentence to:
>>
>> "Let id be a user-agent-defined yet globally unique valid absolute URL."
>>
>> For issue #2:
>>
>> Change
>>
>> "The same absolute URL should be generated for each run of this
>> algorithm when given the same input."
>>
>> to
>>
>> "The same absolute URL must be generated for each run of this algorithm
>> when given the same input. If this requirement can not be fulfilled,
>> then generating a valid Atom feed is not possible and this algorithm
>> should be aborted."
>>
>>
>> IMPACT
>>
>> 1. Positive Effects
>>
>> Consistency between the applicable specs. Also, authors are correctly
>> informed about what it takes to generate proper Atom feeds.
>>
>> 2. Negative Effects
>>
>> None.
>>
>> 3. Conformance Classes Changes
>>
>> Atom feed generators are actually required to generate valid Atom
>> documents (with respect to atom:id).
>>
>> 4. Risks
>>
>> None.
>>
>> REFERENCES
>>
>> Inline.
>
>
>