duplicate URLs in reading order and resource list #138

dauwhe · 2019-10-25T13:53:15Z

Our spec says

Note that a particular resource's URL MUST NOT appear in more than one of these lists, and a URL MUST NOT be repeated within a list.

I can at least imagine content that involves repetition. Poems may have repeated stanzas. It's not impossible to imagine experimental fiction where long sections of content may be repeated. It feels like we are legislating against errors here, but in a way that restricts the freedom of authors. Perhaps this is worthy of discussion?

This could especially become an issue with profiles where the reading order consists of images or audio.

iherman · 2019-10-25T14:19:39Z

Actually... your example violates another restriction in the current spec:

The URLs expressed in the reading order MUST NOT include fragment identifiers.

(I presume you would use a fragment identifier to point to stanzas...)

To be honest, I do not know where these restrictions come from.

dauwhe · 2019-10-25T14:38:49Z

(I presume you would use a fragment identifier to point to stanzas...)

Not necessarily. If the repetition were hundreds of lines it might be a separate file. But now we are asking authors to divide up their content only in certain ways.

Once again we hit a fundamental philosophical difference between EPUB and the web. With EPUB we help authors avoid errors. The web allows authors immense freedom to make mistakes.

Should web publications be like EPUB, or be like the web?

mattgarrish · 2019-10-25T22:40:51Z

But how would you reuse the exact same file in a sequence of pages on the web and still direct the user to the correct previous/next page? Hope you get a meaningful referrer?

I believe the problem is the same for reading systems, as where exactly is the reader if the same file is defined at multiple positions? There will be pathways and situations (resumption) that will complicate knowing the context of the file.

That's my recollection of how we came to this restriction, although I haven't been able to track down where it was discussed.

mattgarrish · 2019-10-28T17:48:25Z

Note that a particular resource's URL MUST NOT appear in more than one of these lists

This part I can get behind reducing to a should, or even dropping. What does it matter if a resource is listed in both when you have to get a union to find the unique list of resources? What breaks by having a resource declared in both lists? It's convenient not to have to list in both, but that shouldn't prevent someone from being super detailed...

Or, put differently, why bother saying it's a union when by definition you should only have to join the lists if you can't repeat resources (doing a union would only matter for error checking).

dauwhe · 2019-10-28T17:53:40Z

What does it matter if a resource is listed in both when you have to get a union to find the unique list of resources? What breaks by having a resource declared in both lists?

Yes. It feels like we're focusing a bit on theoretical purity, rather than being as minimally restrictive as possible while still creating a spec that's implementable.

mattgarrish · 2019-10-28T18:01:38Z

I think this statement is a bit worthless as far as normative statements go, too:

The completeness of the resource list can affect the usability of a digital publication in certain reading scenarios (e.g., the ability to read it offline). For this reason, it is strongly RECOMMENDED to provide a comprehensive list of all of the publication's constituent resources beyond those listed in the default reading order.

"strongly advised", yes, but otherwise what sort of checking is this supposed to entail? I'm not writing an algorithm to hunt for possible missed "comprehensive" resources... :)

mattgarrish · 2019-10-29T11:46:27Z

To try and address all the issues, how about this as a resolution:

we delete the requirements that a URL not appear more than once in a list (this seems problematic for the links section) and not in more than list
for the reading order:
- to avoid rendering ambiguities, resources should not be listed more than once in the default reading order (i.e., we warn but don't strip duplicates)
- ~~fragment identifiers should not be included in the default reading order, but if they are we warn about them and strip them during processing~~
- fragment identifiers may be included, but are regulated at the profile level
for the resource list:
- to avoid conflicting information, resources should not be listed more than once ~~-if they are, only the first instance is retained during processing~~
- ~~fragment identifiers should not be included, warn and strip~~
if a resource has already been declared in the reading order, any subsequent declaration(s) in the resource list are quietly ignored when compiling the list of resources in the bounds
it is only advised to include a comprehensive list of resources
it is not necessary to include a reference to the manifest (not a "must not")
the restriction against using a fragment identifier for toc/pagelist/cover is removed since there is no longer a conflict with the readingOrder/resources rules - if a fragment isn't used, the first instance found in the file is used as per the existing rules
add that cover must not be specified in the links list to match the other structural properties

mattgarrish · 2019-10-29T11:50:24Z

Also:

if a resource has already been declared in the reading order, any subsequent declaration(s) in the resource list are quietly ignored when compiling the list of resources in the bounds

iherman · 2019-10-29T12:32:33Z

That works for me.

HadrienGardeur · 2019-10-30T16:43:46Z

From a UA perspective:

duplicated resources in the resource list is sub-optimal but not a major issue
duplicated resources in the reading order would cause major headaches

Let's say that a resource is present twice in the reading order. The first time at 20% of the book, the second time at 80% of the book.
When I follow a link to that resource, should I consider that this is for the first or second resource?

mattgarrish · 2019-10-30T18:39:27Z

When I follow a link to that resource, should I consider that this is for the first or second resource?

Yes, I'm personally more comfortable leaving that particular restriction in place. Even in an audio format it seems flawed, as links from a table of contents could introduce this confusion. I'm going to un-strike that and see where the discussion Monday leads on this.

dauwhe · 2019-11-04T14:32:15Z

Yes, I'm personally more comfortable leaving that particular restriction in place. Even in an audio format it seems flawed, as links from a table of contents could introduce this confusion. I'm going to un-strike that and see where the discussion Monday leads on this.

I think the TOC case is solvable, with some programming, but the arbitrary link problem is more difficult. I'm OK with keeping the restriction.

mattgarrish · 2019-11-04T14:54:49Z

I think the TOC case is solvable, with some programming, but the arbitrary link problem is more difficult.

It's the pathway to EPUB CFI to make both reliable, but let's not go there...

iherman · 2019-11-05T08:48:20Z

This issue was discussed in a meeting.

No actions or resolutions

View the transcript

Garth Conboy: #138
Garth Conboy: #131 (corresponding PR)
Matt Garrish: a number of issues came up last week… dauwhe had raised a couple of questions… going through algo to implement the spec
… the major ones in 138 are that we’re not going to have a MUST NOT on fragment ids on manifest level, but could be restricted at the profile level
… the other one (PR 131) also deals with publication of resources in reading order and resource list
… we had “MUST NOT” but we will leave it at “SHOULD NOT”… if it’s there, we won’t mess around with your reading order
… strongly advised that you don’t declare resources multiple times in reading order… leads to ambiguity
… we want to generally avoid that issue
… other than that, the changes were mostly editorial cleanup
… to recap: fragment ids are allowed, and you can declare multiple resources, and it will be a warning (still)
… we can discuss the resource issues now… i don’t think we need to restrict fragment ids at the publication level
Garth Conboy: nobody on the queue… ivan made an off comment on alternate issues… what did you mean?
Ivan Herman: we also had an issue in the last few days on what happens if you provide an alternate [?]
Matt Garrish: choosing alternates is issue 133… separate from 138
Garth Conboy: let’s stick with 138 for the moment
… nobody on the queue… dauwhe, you had chimed in on this issue. are you happy with where we are?
Dave Cramer: yes, i think allowing fragment ids is important, but i’m fine with the decision on duplicate resources… when we make restrictions I want us to be very conscious about what we’re restricting and why

iherman · 2019-11-05T08:48:50Z

@mattgarrish can this be closed now?

iherman · 2019-11-05T08:49:45Z

(The corresponding PR, i.e. #131, has been merged...)

dauwhe mentioned this issue Oct 25, 2019

Missing validation steps on URL-s #137

Closed

mattgarrish mentioned this issue Oct 30, 2019

Add section on publication resources #131

Merged

iherman added the propose closing label Nov 5, 2019

mattgarrish closed this as completed Nov 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

duplicate URLs in reading order and resource list #138

duplicate URLs in reading order and resource list #138

dauwhe commented Oct 25, 2019

iherman commented Oct 25, 2019

dauwhe commented Oct 25, 2019

mattgarrish commented Oct 25, 2019

mattgarrish commented Oct 28, 2019

dauwhe commented Oct 28, 2019

mattgarrish commented Oct 28, 2019

mattgarrish commented Oct 29, 2019 •

edited

mattgarrish commented Oct 29, 2019

iherman commented Oct 29, 2019

HadrienGardeur commented Oct 30, 2019

mattgarrish commented Oct 30, 2019

dauwhe commented Nov 4, 2019

mattgarrish commented Nov 4, 2019

iherman commented Nov 5, 2019

iherman commented Nov 5, 2019

iherman commented Nov 5, 2019

duplicate URLs in reading order and resource list #138

duplicate URLs in reading order and resource list #138

Comments

dauwhe commented Oct 25, 2019

iherman commented Oct 25, 2019

dauwhe commented Oct 25, 2019

mattgarrish commented Oct 25, 2019

mattgarrish commented Oct 28, 2019

dauwhe commented Oct 28, 2019

mattgarrish commented Oct 28, 2019

mattgarrish commented Oct 29, 2019 • edited

mattgarrish commented Oct 29, 2019

iherman commented Oct 29, 2019

HadrienGardeur commented Oct 30, 2019

mattgarrish commented Oct 30, 2019

dauwhe commented Nov 4, 2019

mattgarrish commented Nov 4, 2019

iherman commented Nov 5, 2019

iherman commented Nov 5, 2019

iherman commented Nov 5, 2019

mattgarrish commented Oct 29, 2019 •

edited