Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update manifest processing to use infra types #103

Merged
merged 13 commits into from Oct 14, 2019

Conversation

mattgarrish
Copy link
Member

@mattgarrish mattgarrish commented Oct 8, 2019

Opening this as a PR to get feedback, as I'm going cross-eyed working on this section. Beyond just switching terminology, I also tried to tighten up the processing to account for potentially other bits of invalid input, to simplify the duplication of language/direction steps, and to consolidate a number of validation checks that were all dependent on recursion.

The WebIDL section is moved into the processing section, and to de-emphasize the technology in the interim I changed the title to "Data Visualization" (but other ideas welcome).


Preview | Diff

remove commented-out bidi example
@mattgarrish
Copy link
Member Author

mattgarrish commented Oct 9, 2019

One question: should the webidl section also be informative?

We're not actually requiring webidl from the processing, so it's more an example of the final data than a usable schema.

Copy link
Member

@rdeltour rdeltour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a cursory read and it looks like the right direction.

A couple comments:

  • one of my concern is that the output of the algorithm is too vaguely defined. It currently says it's the internal representation, but this has no tangible meaning. My suggestion is to explicitly say that for the purpose of the algorithm, the internal definition is expressed as an Infra map.
  • I think I understand the idea behind the "visualization" terminology, but I'm not sure I like it, at least as a normative section. If it's normative, then we have to normatively define how to convert from the Infra type (result of the algorithm) to this normative structure. If it's not normative, then let's move to an appendix.

If the idea is that we want a simple and terse way to represent the data structure, is it a matter of defining a notation for Infra types?
In that case, we could open an issue to Infra, and work with with Infra people to help define that.

@rdeltour
Copy link
Member

rdeltour commented Oct 9, 2019

One question: should the webidl section also be informative?

You beat me to it, I was writing my review 😊: yes in its current form I think it should be informative.

@mattgarrish
Copy link
Member Author

If the idea is that we want a simple and terse way to represent the data structure, is it a matter of defining a notation for Infra types?

Yes, that would be ideal, if it can happen in time with our charter. Given our current options, however, webidl has proven the most stable and understandable way of expressing this info.

But I agree with webidl being informative if we keep it. It's not expressing the actual final representation.

index.html Show resolved Hide resolved
index.html Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
@iherman
Copy link
Member

iherman commented Oct 9, 2019

Actually, I may revise my earlier comment in another thread, and would opt to move the WebIDL into an (informative) appendix. If we want to change later by either adding other formalism or even formalisms (see #98 (comment)), we can do that more easily... And with the predominance of infra, this would probably be a better place for it.

@iherman
Copy link
Member

iherman commented Oct 9, 2019

I must admit that, looking through the text, I am reinforced in my opinion that something like the WebIDL is necessary. For my taste the move towards infra did not make the spec more easy to read (maybe even on the contrary, but that may be personal), and such an extra aid is really a good thing to have.

@mattgarrish
Copy link
Member Author

For my taste the move towards infra did not make the spec more easy to read

No, that's what I've been struggling with the last couple of days... ;)

But by the end I did find it to be more precise than the rather loose references to javascript and generic object/arrays that we were using. Plus there is the benefit that if devs are already reading this notation in a growing list of other specs, we're not expecting them to figure out something new.

(I couldn't make sense of the final data without a cleaner representation, either.)

@rdeltour
Copy link
Member

rdeltour commented Oct 9, 2019

As an attempt to define the data structure using Infra, here's what it could look like:


A localizable string is a struct with the following items:

(…)

A publication manifest is a struct with the following items:


It may not be as concise as a WebIDL dictionary, but is reasonably easy to read?

@mattgarrish
Copy link
Member Author

* one of my concern is that the output of the algorithm is too vaguely defined. It currently says it's the _internal representation_, but this has no tangible meaning

The statements are somewhat split up. In the first para, yes, all we say is it defines the internal representation, but right before the note there is:

If successful, the algorithm returns a map containing the internal representation of the data.

These can probably be linked better to be clear that the internal representation is defined in terms of infra and results in an infra map, but you can use any language to implement the conversion (provided it can represent the infra data types, of course).

@rdeltour
Copy link
Member

rdeltour commented Oct 9, 2019

Note that in my Infra definition above the types are defined as structs rather than maps as currently used in the PR, which seem to be a better fit…

@mattgarrish
Copy link
Member Author

the types are defined as structs rather than maps as currently used in the PR, which seem to be a better fit…

For expressing the structure, yes, but the conversion of JS to infra changes objects to maps, so the PR is correct.

@rdeltour
Copy link
Member

rdeltour commented Oct 9, 2019

the conversion of JS to infra changes objects to maps

Ah right, good point. I knew I had seen that somewhere.

@rdeltour
Copy link
Member

rdeltour commented Oct 9, 2019

the conversion of JS to infra changes objects to maps

However, maps can be used as the starting point and converted to structs by the processing algorithm… Anyways, it's probably bike shedding at this proof-of-concept stage 😊

@mattgarrish
Copy link
Member Author

However, maps can be used as the starting point and converted to structs by the processing algorithm…

Could you do this easily? Don't structs come with a restricted set of members? The incoming data may not match the struct because of the extensibility of the manifest, so would it require a new struct that extends the base struct?

(WebIDL doesn't really capture this extensibility, either, I suppose, but if we're actually converting...)

Anyways, it's probably bike shedding at this proof-of-concept stage

Agreed.

fix some links referencing the wrong infra definitions;
fix some phrasing to better reflect the infra spec
@iherman
Copy link
Member

iherman commented Oct 10, 2019

I am not sure what is happening, and I cannot locate it on the diff view above: It seems that part of 7.1 (setting the name to the value of <title> as a default) is missing toward the end. This is what I see on my screen (I checked it with two different browsers):

Screen Shot 2019-10-10 at 06 23 07

@mattgarrish
Copy link
Member Author

It seems that part of 7.1 (setting the name to the value of <title> as a default) is missing toward the end.

No, I put a pre where there should be a code. It's saying to set name to a list with title as the only item:

Set processed["name"] to « title ».

Copy link
Member

@rdeltour rdeltour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! The requested changes are only the editorial stuff (missing back references). The other questions can be either dealt in this PR or moved to separate issues (to not block this PR).

index.html Outdated
Comment on lines 3375 to 3383
<p>otherwise, if <var>normalized</var> is a <a
href="https://infra.spec.whatwg.org/#list">list</a>, perform this step on
each <a href="https://infra.spec.whatwg.org/#list-item">item</a> of the
list.</p>
</li>
<li>
<p>otherwise, if <var>normalized</var> is a <a
href="https://infra.spec.whatwg.org/#ordered-map">map</a>, perform this step
on each <a href="https://infra.spec.whatwg.org/#map-key">key</a> of the map.</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is "this step" here? there should be a link to said step to clarify that.
is the idea to resolve all the URLs in the list? but then the term still expects a single URL, not a list?
I'm probably missing something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, looks like I pasted them into the wrong place at some point. Those two steps are supposed to come after the normalizing steps, as they're intended to recursively descend into normalized to run these operations on subproperties.

If we only go through the primary keys of the manifest, we'll miss normalizing properties inside LinkedResources, for example (or whatever an extension might defined).

index.html Outdated
<p>if <var>normalized</var> is a <a href="https://infra.spec.whatwg.org/#string"
>string</a> that is not an <a data-cite="!url/#absolute-url-string">absolute
URL string</a>, resolve it to an absolute URL string using the value of
<var>base</var>&#160;[[!rfc1808]].</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this comment shouldn't block this PR, it can be moved to a separate issue.

this is tangential to the goal of this PR, but why referencing RFC 1808 here and not use the URL parsing algorithm from the URL standard? This algo takes a base URL as input to perform URL resolution.

Also btw, this step could result in a URL record, not a string. That would even simplify the logic down to:

if normalized is a string, set normalized to the result of running the URL parser with normalized as input and base url as the base URL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I can't track down where 1808 came from, but since we use URL standard everywhere else, I don't see why we wouldn't use it for resolving.

@@ -3020,36 +2919,34 @@ <h2>Processing a Manifest</h2>
manifest</a>, when available.</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(commenting here has this specific text wasn't affected by the PR):

base: a URL string that represents the base URL for the manifest.

should link back to our definition of the manifest base URL.

Comment on lines +3604 to +3621
<p>(<a href="#duration"></a>) If <var>processed["duration"]</var> is set and is not a valid
duration value, per&#160;[[!iso8601]], <a>validation error</a>, <a
href="https://infra.spec.whatwg.org/#map-remove">remove</a>
<var>processed["duration"]</var>.</p>
</li>

<li id="processing-checks-last-mod">
<p>(<a href="#last-modification-date"></a>) If <var>processed["dateModified"]</var> is set
and its value is not a valid date or date-time per&#160;[[!iso8601]], this is a
<a>validation error</a>.</p>
and is not a valid date or date-time per&#160;[[!iso8601]], <a>validation error</a>, <a
href="https://infra.spec.whatwg.org/#map-remove">remove</a>
<var>processed["dateModified"]</var>.</p>
</li>

<li id="processing-checks-pub-date">
<p>(<a href="#publication-date"></a>) If <var>processed["datePublished"]</var> is set and
its value is not a valid date or date-time per&#160;[[!iso8601]], this is a
<a>validation error</a>.</p>
<p>(<a href="#publication-date"></a>) If <var>processed["datePublished"]</var> is set and is
not a valid date or date-time per&#160;[[!iso8601]], <a>validation error</a>, <a
href="https://infra.spec.whatwg.org/#map-remove">remove</a>
<var>processed["datePublished"]</var>.</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this comment shouldn't block this PR, it can be further discussed in #104 and —if it results in any change— dealt-with in another PR.

do we want to allow smart UA to recover from these? Some might be able to fix an invalid date string, or convert a date in natural language to the ISO format.

One possibility could be to define a generic step like "convert input to the expected type" as a UA-defined step, and either return that or remove.

I'm not sure; what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically for dates this may make sense (I am a little bit afraid we create a precedence, but maybe we are fine), but the step you propose can be read as if the UA MUST try to recover. This may be a step too far (let alone the fact that we would have to test this, and I am not sure how many different strings there are that can be interpreted correctly).

"... is set and is not a valid date or date-time per [[!iso8601]], validation error. If the User Agent is able to interpret the value as a valid date or date-time, replace the value with that generated value; otherwise remove it."

Or something like that, converted to infra-speak...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this lead to interop problems if we can't define how to fix the invalid input? There are a whole host of reasons why the input could be invalid, from the data type being wrong to the date string not conforming.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... what is worse, it could lead to wrong interpretations. E.g., what date is meant by "01/02/03"? 1st of February 2003 (in the UK), 2nd of January 2001 (in the US) and, I believe, 3rd of February in 2001 in Japan (or in Hungary, b.t.w.).

Maybe we should keep it as is after all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the step you propose can be read as if the UA MUST try to recover.

sorry I wasn't clear. I was suggesting a MAY at best 🙂

Doesn't this lead to interop problems if we can't define how to fix the invalid input?

yeah, that's why I'm not sure.It's not so uncommon to have implementation-defined behavior/processing. There's not a huge risk for interoperability I think, given the nature of the metadata. At worst there will be cases like users not getting their publications sorted in the same order on various UAs if they sort it by date, maybe this isn't too bad? Again, these are open questions, I don't have a strong opinion on this.

it could lead to wrong interpretations

Right. I think that given a language and a date string, a lot can be done (e.g. "Dec 2019", "01/02/03" are not unambiguous in US English). I'm more worried of implementation starting using a default date instead of removing the value if they can't make sense of the authored date.

As always it would be interesting to see if "invalid" dates are commonly found in today's EPUB's for instance, to have an idea whether this would be useful at all.

In any case, keeping it as is may be safer for a first version?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, keeping it as is may be safer for a first version?

Yes

move alternate to a validation check;
add default for conformsto
@mattgarrish
Copy link
Member Author

I just pushed a change to unlink the webidl from the property definitions, but could be convinced to undo this.

Given that we've made the idl informative and moved to an appendix, the bidirectional linking is confusing.

@mattgarrish
Copy link
Member Author

mattgarrish commented Oct 11, 2019

In addition to @rdeltour's comments, I also added the default profile URL from #101 (I still don't quite understand how this will work, but we can tweak that).

Also, per w3c/audiobooks#27, I removed alternate from the step to expand LinkedResources and put it into the validation steps to make sure that url and encodingFormat are always set.

The only way we could allow alternate resources to be strings is if we get into inferring media types from extensions. It may be more complicated than that, though, as we'd have to lower the requirement for encodingFormat to a should (i.e., we'd have to bless media type guessing).

We could permit guessing to avoid alternate resources being thrown out, but like the other issue with dates it gets into how much work we want to push onto user agents.

@mattgarrish
Copy link
Member Author

Hm, I should have double-checked the definition. I was thinking encodingFormat was required except for HTML, but now I see it's just:

Non-HTML resources SHOULD be expressed as LinkedResource objects with their encodingFormat values set.

So guess we can do both. Expand strings and also validate that encodingFormat is set (but not omit if not).

@iherman
Copy link
Member

iherman commented Oct 12, 2019

I just pushed a change to unlink the webidl from the property definitions, but could be convinced to undo this.

Given that we've made the idl informative and moved to an appendix, the bidirectional linking is confusing.

Indeed. However, there is a strange effect now (respec is getting in our way): in, say, A.1.1., the terms themselves (like abridged) are active links which link to, I believe, themselves, ie, not back into the text. Which is a bit strange.

I wonder whether we should not get rid of respec's WebIDL processing here...

Also: you have a subsection A.1 for WebIDL. Does it mean that we would want to put other formalisms as "A.2"?

@iherman
Copy link
Member

iherman commented Oct 12, 2019

Also, per w3c/audiobooks#27, I removed alternate from the step to expand LinkedResources and put it into the validation steps to make sure that url and encodingFormat are always set.

Is this related to #105 ? My feeling is that there should be no exception for audiobooks, in fact...

@mattgarrish
Copy link
Member Author

I wonder whether we should not get rid of respec's WebIDL processing here...

That works for me.

Does it mean that we would want to put other formalisms as "A.2"?

Leaving open that possibility, yes. We can always revise at the end of CR if we still only have webidl to show.

@mattgarrish
Copy link
Member Author

My feeling is that there should be no exception for audiobooks, in fact...

Ya, I screwed that up by not double-checking what I thought I knew. I put back the expansion of strings, but I'm still confused about why HTML resources get a pass from encodingType. As I just mentioned in #105, either warnings for all or no warnings.

@iherman
Copy link
Member

iherman commented Oct 12, 2019

Leaving open that possibility, yes. We can always revise at the end of CR if we still only have webidl to show.

As a new convert to Typescript I believe it would be useful:-) but I am not sure it would bring anything new, so it may be unnecessary.

add check that conformsto is a list;
add variable for storing profile;
fix recursion steps in normalization
…as the later recursive steps already handle this
…ck and the value property of each object in the name array needs to be inspected, too;

merges the validation checks for linkedresources as there is no need to have them separate;
some minor cleanup of variable names
@iherman
Copy link
Member

iherman commented Oct 14, 2019

This issue was discussed in a meeting.

  • No actions or resolutions
View the transcript update manifest processing to use infra types
Garth Conboy: See Issue #101
Garth Conboy: See Issue #98
Ivan Herman: See PR #103
Garth Conboy: There are 2 issues in this pull request - this is also one that Ivan and Matt have…
… Does anyone on the call want to run through this and hope to get to a consensus?
Ivan Herman: The work was done by Matt.
… I would prefer Romain to comment on this as he was the one initiating this whole work and can give a more appropriate background.
Romain Deltour: Basically it originated from trying to get rid of the webIGL as a way to describe a datastructure. In trying to harmonize the various manifests in the W3C, we wanted to use infra-types to represent the data structure. Matt’s editing the spec to replace webIGL for infra types.
Ivan Herman: See Infra standard
Romain Deltour: The WebIDL representative is moved to informative edits. A few different issues were raised when we looked at the algorithm. I think it’s clearer now, but we need more reviews there. It will be interesting to have more.
Garth Conboy: Looking at the PR, Romain you have one requested change outstanding and Ivan has approved it.
Ivan Herman: I am now repeating what Dave said earlier on the other PR. This was a pretty large change that made the document better and Matt did great work. Now it becomes difficult to discuss other issues and details with this as a PR.
Romain Deltour: I think my requests have been addressed in Matt’s latest changes
Ivan Herman: I would be in favor of merging this ASAP (as soon as Matt is comfortable) then we can comment and modify once it’s in. Again, this is something he should do…
Romain Deltour: +1 on merging and doing more reviews/changes based on Matt’s work
Garth Conboy: Sounds good for me. If there are subsequent changes, they will be smaller. Barring objection, why don’t we go ahead and put that as a request for Matt…

@mattgarrish mattgarrish merged commit 07f264d into master Oct 14, 2019
@mattgarrish mattgarrish deleted the editorial/infra-compat branch October 14, 2019 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants