New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update manifest processing to use infra types #103
Conversation
remove commented-out bidi example
One question: should the webidl section also be informative? We're not actually requiring webidl from the processing, so it's more an example of the final data than a usable schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a cursory read and it looks like the right direction.
A couple comments:
- one of my concern is that the output of the algorithm is too vaguely defined. It currently says it's the internal representation, but this has no tangible meaning. My suggestion is to explicitly say that for the purpose of the algorithm, the internal definition is expressed as an Infra map.
- I think I understand the idea behind the "visualization" terminology, but I'm not sure I like it, at least as a normative section. If it's normative, then we have to normatively define how to convert from the Infra type (result of the algorithm) to this normative structure. If it's not normative, then let's move to an appendix.
If the idea is that we want a simple and terse way to represent the data structure, is it a matter of defining a notation for Infra types?
In that case, we could open an issue to Infra, and work with with Infra people to help define that.
You beat me to it, I was writing my review 😊: yes in its current form I think it should be informative. |
Yes, that would be ideal, if it can happen in time with our charter. Given our current options, however, webidl has proven the most stable and understandable way of expressing this info. But I agree with webidl being informative if we keep it. It's not expressing the actual final representation. |
Actually, I may revise my earlier comment in another thread, and would opt to move the WebIDL into an (informative) appendix. If we want to change later by either adding other formalism or even formalisms (see #98 (comment)), we can do that more easily... And with the predominance of infra, this would probably be a better place for it. |
I must admit that, looking through the text, I am reinforced in my opinion that something like the WebIDL is necessary. For my taste the move towards infra did not make the spec more easy to read (maybe even on the contrary, but that may be personal), and such an extra aid is really a good thing to have. |
No, that's what I've been struggling with the last couple of days... ;) But by the end I did find it to be more precise than the rather loose references to javascript and generic object/arrays that we were using. Plus there is the benefit that if devs are already reading this notation in a growing list of other specs, we're not expecting them to figure out something new. (I couldn't make sense of the final data without a cleaner representation, either.) |
As an attempt to define the data structure using Infra, here's what it could look like: A localizable string is a struct with the following items:
(…) A publication manifest is a struct with the following items:
It may not be as concise as a WebIDL dictionary, but is reasonably easy to read? |
The statements are somewhat split up. In the first para, yes, all we say is it defines the internal representation, but right before the note there is:
These can probably be linked better to be clear that the internal representation is defined in terms of infra and results in an infra map, but you can use any language to implement the conversion (provided it can represent the infra data types, of course). |
Note that in my Infra definition above the types are defined as structs rather than maps as currently used in the PR, which seem to be a better fit… |
For expressing the structure, yes, but the conversion of JS to infra changes objects to maps, so the PR is correct. |
Ah right, good point. I knew I had seen that somewhere. |
However, maps can be used as the starting point and converted to structs by the processing algorithm… Anyways, it's probably bike shedding at this proof-of-concept stage 😊 |
Could you do this easily? Don't structs come with a restricted set of members? The incoming data may not match the struct because of the extensibility of the manifest, so would it require a new struct that extends the base struct? (WebIDL doesn't really capture this extensibility, either, I suppose, but if we're actually converting...)
Agreed. |
fix some links referencing the wrong infra definitions; fix some phrasing to better reflect the infra spec
No, I put a
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! The requested changes are only the editorial stuff (missing back references). The other questions can be either dealt in this PR or moved to separate issues (to not block this PR).
index.html
Outdated
<p>otherwise, if <var>normalized</var> is a <a | ||
href="https://infra.spec.whatwg.org/#list">list</a>, perform this step on | ||
each <a href="https://infra.spec.whatwg.org/#list-item">item</a> of the | ||
list.</p> | ||
</li> | ||
<li> | ||
<p>otherwise, if <var>normalized</var> is a <a | ||
href="https://infra.spec.whatwg.org/#ordered-map">map</a>, perform this step | ||
on each <a href="https://infra.spec.whatwg.org/#map-key">key</a> of the map.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is "this step" here? there should be a link to said step to clarify that.
is the idea to resolve all the URLs in the list? but then the term still expects a single URL, not a list?
I'm probably missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, looks like I pasted them into the wrong place at some point. Those two steps are supposed to come after the normalizing steps, as they're intended to recursively descend into normalized
to run these operations on subproperties.
If we only go through the primary keys of the manifest, we'll miss normalizing properties inside LinkedResources, for example (or whatever an extension might defined).
index.html
Outdated
<p>if <var>normalized</var> is a <a href="https://infra.spec.whatwg.org/#string" | ||
>string</a> that is not an <a data-cite="!url/#absolute-url-string">absolute | ||
URL string</a>, resolve it to an absolute URL string using the value of | ||
<var>base</var> [[!rfc1808]].</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this comment shouldn't block this PR, it can be moved to a separate issue.
this is tangential to the goal of this PR, but why referencing RFC 1808 here and not use the URL parsing algorithm from the URL standard? This algo takes a base URL as input to perform URL resolution.
Also btw, this step could result in a URL record, not a string. That would even simplify the logic down to:
if normalized is a string, set normalized to the result of running the URL parser with normalized as input and base url as the base URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, I can't track down where 1808 came from, but since we use URL standard everywhere else, I don't see why we wouldn't use it for resolving.
@@ -3020,36 +2919,34 @@ <h2>Processing a Manifest</h2> | |||
manifest</a>, when available.</li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(commenting here has this specific text wasn't affected by the PR):
base: a URL string that represents the base URL for the manifest.
should link back to our definition of the manifest base URL.
<p>(<a href="#duration"></a>) If <var>processed["duration"]</var> is set and is not a valid | ||
duration value, per [[!iso8601]], <a>validation error</a>, <a | ||
href="https://infra.spec.whatwg.org/#map-remove">remove</a> | ||
<var>processed["duration"]</var>.</p> | ||
</li> | ||
|
||
<li id="processing-checks-last-mod"> | ||
<p>(<a href="#last-modification-date"></a>) If <var>processed["dateModified"]</var> is set | ||
and its value is not a valid date or date-time per [[!iso8601]], this is a | ||
<a>validation error</a>.</p> | ||
and is not a valid date or date-time per [[!iso8601]], <a>validation error</a>, <a | ||
href="https://infra.spec.whatwg.org/#map-remove">remove</a> | ||
<var>processed["dateModified"]</var>.</p> | ||
</li> | ||
|
||
<li id="processing-checks-pub-date"> | ||
<p>(<a href="#publication-date"></a>) If <var>processed["datePublished"]</var> is set and | ||
its value is not a valid date or date-time per [[!iso8601]], this is a | ||
<a>validation error</a>.</p> | ||
<p>(<a href="#publication-date"></a>) If <var>processed["datePublished"]</var> is set and is | ||
not a valid date or date-time per [[!iso8601]], <a>validation error</a>, <a | ||
href="https://infra.spec.whatwg.org/#map-remove">remove</a> | ||
<var>processed["datePublished"]</var>.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this comment shouldn't block this PR, it can be further discussed in #104 and —if it results in any change— dealt-with in another PR.
do we want to allow smart UA to recover from these? Some might be able to fix an invalid date string, or convert a date in natural language to the ISO format.
One possibility could be to define a generic step like "convert input to the expected type" as a UA-defined step, and either return that or remove.
I'm not sure; what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically for dates this may make sense (I am a little bit afraid we create a precedence, but maybe we are fine), but the step you propose can be read as if the UA MUST try to recover. This may be a step too far (let alone the fact that we would have to test this, and I am not sure how many different strings there are that can be interpreted correctly).
"... is set and is not a valid date or date-time per [[!iso8601]], validation error. If the User Agent is able to interpret the value as a valid date or date-time, replace the value with that generated value; otherwise remove it."
Or something like that, converted to infra-speak...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this lead to interop problems if we can't define how to fix the invalid input? There are a whole host of reasons why the input could be invalid, from the data type being wrong to the date string not conforming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... what is worse, it could lead to wrong interpretations. E.g., what date is meant by "01/02/03"? 1st of February 2003 (in the UK), 2nd of January 2001 (in the US) and, I believe, 3rd of February in 2001 in Japan (or in Hungary, b.t.w.).
Maybe we should keep it as is after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the step you propose can be read as if the UA MUST try to recover.
sorry I wasn't clear. I was suggesting a MAY at best 🙂
Doesn't this lead to interop problems if we can't define how to fix the invalid input?
yeah, that's why I'm not sure.It's not so uncommon to have implementation-defined behavior/processing. There's not a huge risk for interoperability I think, given the nature of the metadata. At worst there will be cases like users not getting their publications sorted in the same order on various UAs if they sort it by date, maybe this isn't too bad? Again, these are open questions, I don't have a strong opinion on this.
it could lead to wrong interpretations
Right. I think that given a language and a date string, a lot can be done (e.g. "Dec 2019", "01/02/03" are not unambiguous in US English). I'm more worried of implementation starting using a default date instead of removing the value if they can't make sense of the authored date.
As always it would be interesting to see if "invalid" dates are commonly found in today's EPUB's for instance, to have an idea whether this would be useful at all.
In any case, keeping it as is may be safer for a first version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, keeping it as is may be safer for a first version?
Yes
move alternate to a validation check; add default for conformsto
I just pushed a change to unlink the webidl from the property definitions, but could be convinced to undo this. Given that we've made the idl informative and moved to an appendix, the bidirectional linking is confusing. |
In addition to @rdeltour's comments, I also added the default profile URL from #101 (I still don't quite understand how this will work, but we can tweak that). Also, per w3c/audiobooks#27, I removed The only way we could allow alternate resources to be strings is if we get into inferring media types from extensions. It may be more complicated than that, though, as we'd have to lower the requirement for We could permit guessing to avoid alternate resources being thrown out, but like the other issue with dates it gets into how much work we want to push onto user agents. |
Hm, I should have double-checked the definition. I was thinking encodingFormat was required except for HTML, but now I see it's just:
So guess we can do both. Expand strings and also validate that encodingFormat is set (but not omit if not). |
Indeed. However, there is a strange effect now (respec is getting in our way): in, say, A.1.1., the terms themselves (like I wonder whether we should not get rid of respec's WebIDL processing here... Also: you have a subsection A.1 for WebIDL. Does it mean that we would want to put other formalisms as "A.2"? |
Is this related to #105 ? My feeling is that there should be no exception for audiobooks, in fact... |
That works for me.
Leaving open that possibility, yes. We can always revise at the end of CR if we still only have webidl to show. |
Ya, I screwed that up by not double-checking what I thought I knew. I put back the expansion of strings, but I'm still confused about why HTML resources get a pass from |
As a new convert to Typescript I believe it would be useful:-) but I am not sure it would bring anything new, so it may be unnecessary. |
add check that conformsto is a list; add variable for storing profile; fix recursion steps in normalization
…as the later recursive steps already handle this
…ck and the value property of each object in the name array needs to be inspected, too; merges the validation checks for linkedresources as there is no need to have them separate; some minor cleanup of variable names
This issue was discussed in a meeting.
View the transcriptupdate manifest processing to use infra typesGarth Conboy: See Issue #101 Garth Conboy: See Issue #98 Ivan Herman: See PR #103 Garth Conboy: There are 2 issues in this pull request - this is also one that Ivan and Matt have… … Does anyone on the call want to run through this and hope to get to a consensus? Ivan Herman: The work was done by Matt. … I would prefer Romain to comment on this as he was the one initiating this whole work and can give a more appropriate background. Romain Deltour: Basically it originated from trying to get rid of the webIGL as a way to describe a datastructure. In trying to harmonize the various manifests in the W3C, we wanted to use infra-types to represent the data structure. Matt’s editing the spec to replace webIGL for infra types. Ivan Herman: See Infra standard Romain Deltour: The WebIDL representative is moved to informative edits. A few different issues were raised when we looked at the algorithm. I think it’s clearer now, but we need more reviews there. It will be interesting to have more. Garth Conboy: Looking at the PR, Romain you have one requested change outstanding and Ivan has approved it. Ivan Herman: I am now repeating what Dave said earlier on the other PR. This was a pretty large change that made the document better and Matt did great work. Now it becomes difficult to discuss other issues and details with this as a PR. Romain Deltour: I think my requests have been addressed in Matt’s latest changes Ivan Herman: I would be in favor of merging this ASAP (as soon as Matt is comfortable) then we can comment and modify once it’s in. Again, this is something he should do… Romain Deltour: +1 on merging and doing more reviews/changes based on Matt’s work Garth Conboy: Sounds good for me. If there are subsequent changes, they will be smaller. Barring objection, why don’t we go ahead and put that as a request for Matt… |
Opening this as a PR to get feedback, as I'm going cross-eyed working on this section. Beyond just switching terminology, I also tried to tighten up the processing to account for potentially other bits of invalid input, to simplify the duplication of language/direction steps, and to consolidate a number of validation checks that were all dependent on recursion.
The WebIDL section is moved into the processing section, and to de-emphasize the technology in the interim I changed the title to "Data Visualization" (but other ideas welcome).
Preview | Diff