Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicate URLs in reading order and resource list #138

Closed
dauwhe opened this issue Oct 25, 2019 · 16 comments
Closed

duplicate URLs in reading order and resource list #138

dauwhe opened this issue Oct 25, 2019 · 16 comments

Comments

@dauwhe
Copy link

dauwhe commented Oct 25, 2019

Our spec says

Note that a particular resource's URL MUST NOT appear in more than one of these lists, and a URL MUST NOT be repeated within a list.

I can at least imagine content that involves repetition. Poems may have repeated stanzas. It's not impossible to imagine experimental fiction where long sections of content may be repeated. It feels like we are legislating against errors here, but in a way that restricts the freedom of authors. Perhaps this is worthy of discussion?

This could especially become an issue with profiles where the reading order consists of images or audio.

@iherman
Copy link
Member

iherman commented Oct 25, 2019

Actually... your example violates another restriction in the current spec:

The URLs expressed in the reading order MUST NOT include fragment identifiers.

(I presume you would use a fragment identifier to point to stanzas...)

To be honest, I do not know where these restrictions come from.

@dauwhe
Copy link
Author

dauwhe commented Oct 25, 2019

(I presume you would use a fragment identifier to point to stanzas...)

Not necessarily. If the repetition were hundreds of lines it might be a separate file. But now we are asking authors to divide up their content only in certain ways.

Once again we hit a fundamental philosophical difference between EPUB and the web. With EPUB we help authors avoid errors. The web allows authors immense freedom to make mistakes.

Should web publications be like EPUB, or be like the web?

@mattgarrish
Copy link
Member

But how would you reuse the exact same file in a sequence of pages on the web and still direct the user to the correct previous/next page? Hope you get a meaningful referrer?

I believe the problem is the same for reading systems, as where exactly is the reader if the same file is defined at multiple positions? There will be pathways and situations (resumption) that will complicate knowing the context of the file.

That's my recollection of how we came to this restriction, although I haven't been able to track down where it was discussed.

@mattgarrish
Copy link
Member

Note that a particular resource's URL MUST NOT appear in more than one of these lists

This part I can get behind reducing to a should, or even dropping. What does it matter if a resource is listed in both when you have to get a union to find the unique list of resources? What breaks by having a resource declared in both lists? It's convenient not to have to list in both, but that shouldn't prevent someone from being super detailed...

Or, put differently, why bother saying it's a union when by definition you should only have to join the lists if you can't repeat resources (doing a union would only matter for error checking).

@dauwhe
Copy link
Author

dauwhe commented Oct 28, 2019

What does it matter if a resource is listed in both when you have to get a union to find the unique list of resources? What breaks by having a resource declared in both lists?

Yes. It feels like we're focusing a bit on theoretical purity, rather than being as minimally restrictive as possible while still creating a spec that's implementable.

@mattgarrish
Copy link
Member

I think this statement is a bit worthless as far as normative statements go, too:

The completeness of the resource list can affect the usability of a digital publication in certain reading scenarios (e.g., the ability to read it offline). For this reason, it is strongly RECOMMENDED to provide a comprehensive list of all of the publication's constituent resources beyond those listed in the default reading order.

"strongly advised", yes, but otherwise what sort of checking is this supposed to entail? I'm not writing an algorithm to hunt for possible missed "comprehensive" resources... :)

@mattgarrish
Copy link
Member

mattgarrish commented Oct 29, 2019

To try and address all the issues, how about this as a resolution:

  • we delete the requirements that a URL not appear more than once in a list (this seems problematic for the links section) and not in more than list
  • for the reading order:
    • to avoid rendering ambiguities, resources should not be listed more than once in the default reading order (i.e., we warn but don't strip duplicates)
    • fragment identifiers should not be included in the default reading order, but if they are we warn about them and strip them during processing
    • fragment identifiers may be included, but are regulated at the profile level
  • for the resource list:
    • to avoid conflicting information, resources should not be listed more than once -if they are, only the first instance is retained during processing
    • fragment identifiers should not be included, warn and strip
  • if a resource has already been declared in the reading order, any subsequent declaration(s) in the resource list are quietly ignored when compiling the list of resources in the bounds
  • it is only advised to include a comprehensive list of resources
  • it is not necessary to include a reference to the manifest (not a "must not")
  • the restriction against using a fragment identifier for toc/pagelist/cover is removed since there is no longer a conflict with the readingOrder/resources rules - if a fragment isn't used, the first instance found in the file is used as per the existing rules
  • add that cover must not be specified in the links list to match the other structural properties

@mattgarrish
Copy link
Member

Also:

  • if a resource has already been declared in the reading order, any subsequent declaration(s) in the resource list are quietly ignored when compiling the list of resources in the bounds

@iherman
Copy link
Member

iherman commented Oct 29, 2019

That works for me.

@HadrienGardeur
Copy link

From a UA perspective:

  • duplicated resources in the resource list is sub-optimal but not a major issue
  • duplicated resources in the reading order would cause major headaches

Let's say that a resource is present twice in the reading order. The first time at 20% of the book, the second time at 80% of the book.
When I follow a link to that resource, should I consider that this is for the first or second resource?

@mattgarrish
Copy link
Member

When I follow a link to that resource, should I consider that this is for the first or second resource?

Yes, I'm personally more comfortable leaving that particular restriction in place. Even in an audio format it seems flawed, as links from a table of contents could introduce this confusion. I'm going to un-strike that and see where the discussion Monday leads on this.

@dauwhe
Copy link
Author

dauwhe commented Nov 4, 2019

Yes, I'm personally more comfortable leaving that particular restriction in place. Even in an audio format it seems flawed, as links from a table of contents could introduce this confusion. I'm going to un-strike that and see where the discussion Monday leads on this.

I think the TOC case is solvable, with some programming, but the arbitrary link problem is more difficult. I'm OK with keeping the restriction.

@mattgarrish
Copy link
Member

I think the TOC case is solvable, with some programming, but the arbitrary link problem is more difficult.

It's the pathway to EPUB CFI to make both reliable, but let's not go there...

@iherman
Copy link
Member

iherman commented Nov 5, 2019

This issue was discussed in a meeting.

  • No actions or resolutions
View the transcript Garth Conboy: #138
Garth Conboy: #131 (corresponding PR)
Matt Garrish: a number of issues came up last week… dauwhe had raised a couple of questions… going through algo to implement the spec
… the major ones in 138 are that we’re not going to have a MUST NOT on fragment ids on manifest level, but could be restricted at the profile level
… the other one (PR 131) also deals with publication of resources in reading order and resource list
… we had “MUST NOT” but we will leave it at “SHOULD NOT”… if it’s there, we won’t mess around with your reading order
… strongly advised that you don’t declare resources multiple times in reading order… leads to ambiguity
… we want to generally avoid that issue
… other than that, the changes were mostly editorial cleanup
… to recap: fragment ids are allowed, and you can declare multiple resources, and it will be a warning (still)
… we can discuss the resource issues now… i don’t think we need to restrict fragment ids at the publication level
Garth Conboy: nobody on the queue… ivan made an off comment on alternate issues… what did you mean?
Ivan Herman: we also had an issue in the last few days on what happens if you provide an alternate [?]
Matt Garrish: choosing alternates is issue 133… separate from 138
Garth Conboy: let’s stick with 138 for the moment
… nobody on the queue… dauwhe, you had chimed in on this issue. are you happy with where we are?
Dave Cramer: yes, i think allowing fragment ids is important, but i’m fine with the decision on duplicate resources… when we make restrictions I want us to be very conscious about what we’re restricting and why

@iherman
Copy link
Member

iherman commented Nov 5, 2019

@mattgarrish can this be closed now?

@iherman
Copy link
Member

iherman commented Nov 5, 2019

(The corresponding PR, i.e. #131, has been merged...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants