Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is duration required for audiobooks? #420

Closed
wareid opened this issue Apr 1, 2019 · 21 comments
Closed

Is duration required for audiobooks? #420

wareid opened this issue Apr 1, 2019 · 21 comments

Comments

@wareid
Copy link

wareid commented Apr 1, 2019

As discussed in the call on April 1, is top-level duration of an audiobook required?

Pros:

  • Provides UAs with important metadata about the audiobook for processing
  • Provides UAs with essential metadata to show user

Cons:

  • Duration metadata can potentially be inaccurate if left to content creator, causing issues for the UA

There might be other pros/cons, but I am leaning towards this being a requirement.

@llemeurfr
Copy link
Contributor

After internal discussion, we see the interest of having duration mandatory. It is not only a descriptive metadata used for display only, but also data used by the user agent to display a timeline for the audiobook (the duration of individual tracks is also really useful in this case, to get the relative size of each track inside the complete timeline).

@BigBlueHat
Copy link
Member

Based on our discussion today, what about this as a proposed option for closing this...

PROPOSAL: use https://schema.org/timeRequired on WP documents (of all kinds) to express expected "duration" of the publication, and continue to use https://schema.org/duration for a more technically accurate expression of length of a MediaObject or Audiobook type specifically.

@iherman
Copy link
Member

iherman commented Apr 9, 2019

This issue was discussed in a meeting.

  • RESOLVED: Move the duration property to web publications
View the transcript duration in the context of WPUB
Wendy Reid: #307
Wendy Reid: #420
Wendy Reid: duration for entire content vs. duration at the resource level
Ivan Herman: duration currently defined in audiobook draft - nothing in there for duration is audiobook specific
… ability to add duration to a resource is generic - doesn’t have to be audio could be video or whatever else
… does it make sense to have a duration for the book as a whole? for audio books, it is a requirement, but not for a general web publication. but this may not be restricted to audio books. Maybe there could be a video book. The concept is generic, but it may be a requirement for audiobooks and not for general web publications.
Avneesh Singh: generic, but in audio profile should be compulsory. But what is duration of whole book? Does that include branches that are not included in the reading order?
Wendy Reid: Never seen a case where resources are not part of the reading order, but it could be possible. So we should include them in the entire duration.
Brady Duga: what is definition of accurate as far as duration? Want to make sure there is no requirement for the reading system.
Ivan Herman: what happens today?
Brady Duga: we determine the duration
Benjamin Young: agree with duga
Avneesh Singh: Duration is important metadata
Marisa DeMeglio: we will need this for sync media too; might not be a precise, e.g. user turns off page number announcements
Lloyd Rasmussen: agree that the reading system should manage those values
Ivan Herman: book level duration is more like general advisory information, not necessarily used by user agent for processing.
… if this is the case, then requiring it to be present sounds like a step too far. Should have/nice to have
… we have the duration set for an audio or video file. Is it required to provide that info as part of the resource description, or will reading system also ignore that?
Benjamin Young: https://schema.org/duration
Benjamin Young: we use schema.org duration property
… what is the point on insisting or not insisting… is it a requirement for reading systems? What is the use case we are trying to solve for?
Wendy Reid: top level duration is useful for user experience, e.g. product detail page. most commonly used on reading system side to break down chapters for example.
Brady Duga: we won’t use resource level durations but will pre-process all the audio to determine. But a web only reading system may not have this capability.
Wendy Reid: maybe the metadata is not used, but it can be provided in case the reading system can make use of it
Avneesh Singh: +1 to move to WP
Wendy Reid: should we move this to WP since it is not necessarily specific to audiobooks?
Proposed resolution: Move the duration property to web publications (Wendy Reid)
Benjamin Young: issue 420 - are we moving the requirement?
Ivan Herman: the term definition should be generic for WP, then we’ll discuss 420
Ivan Herman: +1
Garth Conboy: +1
Tim Cole: +1
Bill Kasdorf: +1
Laurent Le Meur: +1
George Kerscher: +1q+
Franco Alvarado: +1
Ben Schroeter: any opposition to the proposal to move to WP?
Ben Schroeter: vote
Ben Schroeter: +1
Brady Duga: +1
Avneesh Singh: +1
Resolution #2: Move the duration property to web publications
George Kerscher: ivan mentioned attribute about total duration - when a publication is time based media, then the total time would be very useful and I would examine it before downloading an audiobook
Joshua Pyle: +1
George Kerscher: like number of pages of a book
Benjamin Young: duration only allowed on audio books and media, but at the top level for creative works we could use https://schema.org/timeRequired
… “estimated time to consume”
Ivan Herman: sounds like a great match
Benjamin Young: schema.org/timeRequired = “Approximate or typical time it takes to work with or through this learning resource for the typical intended target audience, e.g. ‘P30M’, ‘P1H25M’.”
Wendy Reid: interesting alternative. must think on it.
Ivan Herman: maybe we could in the WP document we make an explicit reference to time required at the top level and see if we can live with that, and keep the precise resource durations separate
Bill Kasdorf: +1 to distinguishing between timeRequired and duration
Wendy Reid: I will add to 420
George Kerscher: “Approximate reading time”

@iherman
Copy link
Member

iherman commented Apr 10, 2019

Note that #420 (comment) is also affected by the precise definition of duration in the draft, see #421 (comment) and follow up. If the decision is not to use schema.org's duration, then I am not sure we should use schema.org's timeRequired either: we have to use a different term defined for publications.

@marisademeglio
Copy link

(Reposting my comment from #421 )

Can we use only one term for duration, and if it's on a resource, it applies to that resource; and if it's at the root-level (along with other publication-wide properties), it applies to the entire publication?

To me, this worked fine and was clear:
https://w3c.github.io/publ-epub-revision/epub32/spec/epub-mediaoverlays.html#example-7

@geoffjukes
Copy link

As I read it duration and timeExpected are intended to express different things.
duration is more appropriate for audio-only audiobooks.
timeExpected is more appropriate for Educational publications with exercises/worksheets.

I have tried to follow #421 - there's a lot to digest in there.

For audio-only, the publication-level duration would simply be a sum of all the resource-level duration values - which is easy to calculate if it is expressed as a float rather than a formatted string.

@marisademeglio
Copy link

marisademeglio commented Apr 12, 2019

At the moment, there is globalDuration:
https://raw.githack.com/w3c/wpub/adding-duration/index.html#global-duration

And totalDuration:
https://raw.githack.com/w3c/wpub/adding-duration/index.html#dom-linkedresource-totalduration

I think it's confusing to have two terms for such close concepts.

Publication-level duration could be a useful reading system convenience - that was our rationale for including it in EPUB 3 media overlays.

I agree that timeExpected feels like an entirely different thing. And, I can see it also applying at resource and publication levels. E.g. a section of a test vs the entire thing.

@geoffjukes
Copy link

I feel that duration in seconds (as a double) is the most common way of expressing total runtime of media. See my post in #421

I agree with @marisademeglio. Blackstone's data model has duration at the resource level (the run time of the file in seconds), and at the "book" level (the run time of the entire book, alongside the title, narrator, etc)

I do see value in a separate timeExpected value for non-media assets (such as quizzes, worksheets etc), but that such a value is subjective and is therefore more akin to 'title' and other metadata. It may be more appropriate to format timeExpected differently, as was originally discusses in #421

@llemeurfr
Copy link
Contributor

llemeurfr commented Apr 12, 2019

If we have a duration (whatever the property name) required on each audiobook resource, I don't see the point having also a publication wide property (duration or any other name) defined as "the sum of the resource durations". Computers know how to sum values ;-).

It's totally different to have a publication wide property expressing an expected reading time experience, which makes sense for any type of publication and is only displayed to the user so that he can choose to start reading a publication if time allows (this is useful in magazine type articles).

@wareid
Copy link
Author

wareid commented Apr 12, 2019

I'm in agreement with Geoff and Marisa on this, I don't love the idea of separating the values or giving them separate names, and I really think it's valuable that we have a total and item-level duration.

To your point on computers being able to sum values, these files are going to be on the web and packaged for reading systems. A reading system's servers might sum it on ingestion, but a web browser is not going to do the same (it'll increase open time if it needs to process all files then sum them), it's a bad user experience when the information can be easily provided. Remember not all audiobooks will be processed by systems before they're opened, even on a reading system app if a file is brought in externally, why should the app be expected to "create" that information every time a new file is loaded? It's also already a common practise in audiobook metadata, it seems to me we're over-complicating an accepted practice.

@geoffjukes
Copy link

I keep coming back to a subjective timeExpected vs the objective duration. Maybe there is value in both, in both places, where it is appropriate.

duration for media types that have that property (audio, video), and timeExpected for where they do not (quizzes etc).

Summing the duration and having that in the book-level metadata area makes sense to me too. It's cheap and easy, so why not.

@wareid
Copy link
Author

wareid commented Apr 12, 2019

timeExpected does seem worth exploration for WP proper as an item-level value for interactive media types, but maybe not audiobooks.

@danielweck
Copy link
Member

I feel that duration in seconds (as a double) is the most common way of expressing total runtime of media. See my post in #421

And I respectfully disagree :)

#421 (comment)

@geoffjukes
Copy link

so much crossover with #421 😄

@llemeurfr
Copy link
Contributor

@wareid wrote

it'll increase open time if it needs to process all files then sum them

It is a misunderstanding: I never proposed to process all files for getting their duration. On the contrary I proposed to require that individual resource durations are set in the manifest. Summing them is therefore a non-brainer for client scripts.

@geoffjukes
Copy link

geoffjukes commented Apr 12, 2019

Summing them is therefore a non-brainer for client scripts

Agreed, and this was my initial though. I think that a publication-level datapoint, that can be read at the same time as other publication level datapoints (such as title, narrator, etc) would be easier on application developers (i.e. it's in the data, so use it).

I definitely think that resource-level duration should be required for time-based media. I still think that this resource-level datapoint should be in seconds. (see #421 (comment))

@marisademeglio
Copy link

If we end up with several different time format strings, it is at least a pain for the developer to cross-reference those and either find a library or write their own. Perhaps they're writing a component that would otherwise have no requirement on it to do time calculations.

@geoffjukes
Copy link

Publication level runtime as an NTP format string
Resource level duration in seconds as a double.

@llemeurfr
Copy link
Contributor

@geoffjukes wrote

Resource level duration in seconds as a double.

But note again that we can't use the schema.org duration property as long as schema.org does not accept a decimal value (currently we would have to use "P2M10S" to express 130 sc). This is why Ivan is proposing a temporary solution, using a different property name.

@geoffjukes
Copy link

geoffjukes commented Apr 13, 2019

@llemeurfr Thank you for clarifying. I think I'm on the same page finally!

https://schema.org/Audiobook

https://en.wikipedia.org/wiki/ISO_8601#Durations

If 130.5 seconds is expressible as PT130.5S, I'd be OK with that.

@iherman
Copy link
Member

iherman commented Apr 16, 2019

This issue was discussed in a meeting.

  • RESOLVED: The time length property (unnamed) will only use a float consisting of the number of seconds of the resource.
  • RESOLVED: duration is a descriptive metadata for WP, whose value is the ISO format (as used in schema.org). It is optional.
  • RESOLVED: Schema.org duration value is _recommended_ metadata for the audiobooks profile
View the transcript duration
Wendy Reid: Issue: #420
Wendy Reid: Open Pull Request: #421
Wendy Reid: should duration be required?
… duration will become part of the core spec
Ivan Herman: there are 2 issues
… one is, if we take duration for one resource, the question that came up was what is the format for the value?
… one possibility would be to use the ISO 8601 Value, which is used by schema
… or we could use RFC 7826
… which is used in the media world
… the majority seem to favor the RFC value, as it’s more readable than ISO
… we had an issue #307 a while ago where that was decided, but it wasn’t clean in the doc
… we can reinforce this decision
… this is one part of it
Wendy Reid: I’ve tried to talk to danbri about this, no response yet
… the RFC value fits better with what we want to do, especially if we also want to reference media fragments
… can we merge the PR and close this issue?
Ivan Herman: do we want to make a new resolution, or the already decided one?
Wendy Reid: let’s stick with the NPT?
Geoff Jukes: I’m confused about the intent
… specifying the duration of the resource
… it’s the file, effectively
… that duration is only specified in seconds
… never anything else
… my concern is that putting in media fragments at the resource level doesn’t make sense
… if the intent is to conform to schema.org, and we should just use ISO
… so why NPT instead of using a double?
… I don’t know why it’s a consideration
Wendy Reid: media fragments will be a thing, although maybe more in TOC etc than in resources
Geoff Jukes: that’s not describing a resource, but metadata
Wendy Reid: we don’t want two different formats for these things
Geoff Jukes: I disagree
… and I’m having trouble with these very long discussions
… I think of a media fragment as a different thing than a resource
Deborah Kaplan: geoffjukes: +1 for calling out our confusing conversations as confusing. Thanks.
Ivan Herman: the NPT format is defined in a way that it can have only a number, which is seconds
… the author may choose to use raw seconds
Tzviya Siegman: i missed last week. is it possible to summarize the discussion?
Ivan Herman: in a way we jumped ahead
… we have make a choice between ISO and RFC
… then during the discussion a third option became possible, just taking the number of seconds
… those are the three options
… so the question is which of the three?
Geoff Jukes: in addition to that, what is the desire to conform to schema? Is that a design principle?
Ivan Herman: we want the contents of the manifest to be accessible to the knowledge graph
… it’s mostly important for bibliographic metadata
Geoff Jukes: the desire to conform to schema is high, so we can obtain cross-vendor parsing capability
… is that correct?
Ivan Herman: yes
Laurent Le Meur: here we are speaking on duration of resource, not duration of audiobook. It’s not a property of a book.
… so it’s not tied to a need to express audiobook metadata for schema
… so we could use seconds, with a name other than duration (like runtime or length)
… and the audiobook industry would be happy with that
Geoff Jukes: I’d be happy with a new thing called length or whatever, that’s just a double
… it’s what we already do
Ivan Herman: to be clear I am just a messenger
… whatever the group decides is fine
Tim Cole: the decision could be made, that in this community we would use duration but constrain the value of seconds
… i think this is OK
… it could be enforced via a context document
… we could also define our own property, and connect it to duration
… there are ways to express constrained versions of other properties
Ivan Herman: I don’t think that works
… schema uses the ISO format, and it doesn’t allow a simple number
… a number can be a subset of RFC, but not of ISO
Wendy Reid: the reason we were leaning on RFC it has only two ways to express time, including only seconds
Ivan Herman: I would propose to move on
… we define that property to have a value being a float consisting of the number of seconds
… with a new term like length
Proposed resolution: The time length property (unnamed) will only use a float consisting of the number of seconds of the resource. (Wendy Reid)
Ivan Herman: +1
Laurent Le Meur: +1
Dave Cramer: +1
Franco Alvarado: +1
Tim Cole: +1
Geoff Jukes: +1
Joshua Pyle: +1
Marisa DeMeglio: 0
Avneesh Singh: +-0
Avneesh Singh: no strong opinion :)
… waiting for feedback from media sync people
Wendy Reid: does this impact sync media
Marisa DeMeglio: I don’t think so… this is just properties of resources
Ben Schroeter: +1
Brady Duga: Abstain (don’t plan to use the value)
Marisa DeMeglio: this issue doesn’t need to get more complicated
Resolution #2: The time length property (unnamed) will only use a float consisting of the number of seconds of the resource.
Ivan Herman: the other issue that came up is more controversial
… there may be a notion of duration of the whole audiobook
… it turned out that having that as book-level metadata is something that implementors may ignore
… they may deduce that from the individual resources
… but it may be helpful as a hint to the user, as a value in the catalog etc
… what I did, mostly to generate discussion, was to
… put a global property there, with the same format
… primarily defined for a user interface
… do we need this, or should we remove it from the PR doc?
Laurent Le Meur: I would say that the audiobook schema.org object supports the duration with ISO 8601
… it’s there and it’s optional, and it is what we want
… it’s descriptive metadata
… we could just adopt this and move on
Ivan Herman: +1 to laurent
Geoff Jukes: +1 to laurent
Wendy Reid: simple descriptive metadata
Proposed resolution: Schema.org’s Duration will be a required metadata descriptor for audiobooks (Wendy Reid)
Ivan Herman: +1
Ben Schroeter: +1
Laurent Le Meur: I thought the idea was to keep it optional, as in schema.org
Proposed resolution: Schema.org’s Duration will be a recommended metadata descriptor for audiobooks (Wendy Reid)
Laurent Le Meur: and it’s a ‘duration’ property (of type ‘Duration’)
Ivan Herman: +1
Laurent Le Meur: +1
Marisa DeMeglio: is this a different property?
Ivan Herman: yes
Marisa DeMeglio: -1
Deborah Kaplan: +1
Tim Cole: +1
Joshua Pyle: +1
Wendy Reid: this is schema.org duration descriptor for audio book, the length of the entire work, the sum of all the parts
Laurent Le Meur: see https://schema.org/Audiobook
Laurent Le Meur: and https://schema.org/duration
Geoff Jukes: it’s not the same concept
… it might be the sum of all resources, but it might be different, for example if there’s non-book audio resources
Ivan Herman: +1 to geoffjukes
Geoff Jukes: +1
Geoff Jukes: so it’s ok to have a different name and format, and it’s good for it to be in schema.org so it’s universally digestable
Wendy Reid: it would be called duration, it would be the total length of the book, provided by the publisher
Deborah Kaplan: are we voting on making this required?
Ivan Herman: there is a mess-up
… there are 2 things here
… one is, what is the global descriptive metadata, and what value it takes
… and the only resolution we are proposing is to use duration with ISO as in schema.org
… and then there’s the question of whether this metadata item is required
Geoff Jukes: I would happy for it to be required
… we have to send it to our publishers/distributors
Wendy Reid: when I said required I meant for the audiobook profile
Laurent Le Meur: q for geoffjukes. Why is it required?
Geoff Jukes: when we send ONIX we include runtime
… and they like to cross-reference to check they make sure they got the right book
… if it’s not required we’ll supply it anyway
Tzviya Siegman: +1 to limited metadata!
Dave Cramer: The web platform requires very little metadata, we should require the important things (title, author), this does not seem like required metadata
… I suggest we make it optional
Ivan Herman: +1 to dauwhe
Laurent Le Meur: +1 to dauwhe
Wendy Reid: for an audiobook it’s almost as important as title
… for a user to understand what they’re getting into
… to find out if it’s abridged or unabridged
… or if my phone will keep it
… I think it should be required
Ivan Herman: there is no requirement to provide metadata for the number of book pages
… but the same argument applies, ish
… I agree it is recommended
… but “must” is too far
Tzviya Siegman: I hate to prolong this discussion
… when we were deciding on EPUB metadata, lots of people said that title should be required
… but then you get into lots of nuance with what titles means, but most systems don’t pay attention
… we should look into how systems work with information about length
… and how this will play out in the real world
… maybe the implementors can tell us more about this information is used
Dave Cramer: It strikes me as many of the arguments for the utility of the information is about file size not chronological duration, this information can be useful, but requiring them is not traditionally how the web works
… we run into issues of validation
… are we then going to get to a point where validators takes the values and compares them
… requiring this is complicated
Laurent Le Meur: I agree on principle we shouldn’t require descriptive metadata
… and we should keep properties required for user agent functioning or content identification
… so we should recommend this, underlining all the advantages of using this
Brady Duga: when considering required metadata, we should ask if it’s impossible to create a book without this metadata.
… if it’s not impossible, we shouldn’t require it
Wendy Reid: I’m OK with recommended, even though y’all are completely wrong :)
Bill Kasdorf: vendors can still require it
Ivan Herman: we have 2 resolutions to take
… we never closed the previous resolution
Proposed resolution: duration is a descriptive metadata for WP, whose value is the ISO format (as used in schema.org). It is optional. (Ivan Herman)
Ivan Herman: +1
Tim Cole: +1
Laurent Le Meur: +1
Bill Kasdorf: +1
Deborah Kaplan: +1
Geoff Jukes: +1
Ben Schroeter: +1
Dave Cramer: +1
Brady Duga: +1
Garth Conboy: +1
George Kerscher: +1
Resolution #3: duration is a descriptive metadata for WP, whose value is the ISO format (as used in schema.org). It is optional.
Joshua Pyle: +1
Proposed resolution: Schema.org duration value is recommended metadata for the audiobooks profile (Wendy Reid)
Ivan Herman: +1
Laurent Le Meur: +1
Marisa DeMeglio: +1
Tim Cole: +1
Bill Kasdorf: +1
Ben Schroeter: +1
Deborah Kaplan: +1
Resolution #4: Schema.org duration value is _recommended_ metadata for the audiobooks profile
Wendy Reid: can we move on and never speak of this again?
Ivan Herman: no
… I will make the edits according to the resolutions, is it OK to then merge?
Wendy Reid: +1
Ivan Herman: and then the PR and the issue can be closed then?
everyone: YES

@wareid wareid closed this as completed Apr 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants