Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add first draft of ocf-lite #30

Merged
merged 4 commits into from Feb 12, 2019
Merged

Add first draft of ocf-lite #30

merged 4 commits into from Feb 12, 2019

Conversation

llemeurfr
Copy link
Contributor

No description provided.

@iherman
Copy link
Member

iherman commented Dec 28, 2018

For a better reference a more readable version is in the html preview

@iherman
Copy link
Member

iherman commented Dec 28, 2018

See my separate comment: #29 (comment)

I.e., we should be careful what title and extension we use.

@iherman
Copy link
Member

iherman commented Dec 28, 2018

Worth adding the open but relevant issues to the text:

(I may miss some).


Minor comment: afaik, the MIME registration expects a person as a contact; the current practice in our case has always been to refer to the staff contact. Ie, you should simply put my name & email address.

@mattgarrish
Copy link
Member

There also shouldn't be any "former editors" of this document. Even an OCF 4 would start with a blank slate. You can mention ineritance from ocf in the acknowledgements.

Copy link
Member

@mattgarrish mattgarrish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/imanifest/manifest/

Copy link
Member

@mattgarrish mattgarrish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

([[!RFC3987]] and [[!RFC3986]]).</p>

<aside class="example">
<p>The following example shows how the Web Publication Manifest <code>imanifest.jsonld</code>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/imanifest/manifest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mattgarrish
Copy link
Member

Another general comment would be to change the file name from "ocf-lite" to something else, as it's not actually called OCF Lite in the document and it's not really a lite version of any existing OCF - it has the potential to cause unnecessary confusion.

@dauwhe
Copy link

dauwhe commented Jan 3, 2019

This appears to disallow embedded manifests, and thus restricts what types of web publications can be packaged.

@murata2makoto
Copy link

What is the difference between OCF and this proposal? Aren't we creating another incompatible spec for fun?

@dauwhe
Copy link

dauwhe commented Jan 7, 2019

Just a reminder that OCF is full of restrictions on filenames, ZIP features, etc. Do we, for example, plan on forbidding BZIP2 compression in OCF Lite?

@llemeurfr
Copy link
Contributor Author

@dauwhe, re. embedding manifest, this is right. We may consider that the container contains a canonical representation of the WP, where the manifest is external. External vs embedded are preferences, the abstract model doesn't care. Having the manifest external is interesting as a clue that the ZIP contains a WP and fetching/parsing the manifest is much easier for a processor without having to parse an HTML page also. More, the audiobook TF has yet to be convinced that a required entry page is a good thing.

@dauwhe re forbidding the ZIP compression, yes we do, as the corresponding wording of OCF is kept as-is.
@murata2makoto OCF can't be used for anything else than EPUB, with required META-INF, container.xml etc. This proposal is aligned with the requirements of WPs, i.e. entry page and manifest.

@llemeurfr
Copy link
Contributor Author

More details for the record:

Here is what I suppressed from the original OCF spec :

  • the mime-type file used as magic number; therefore a unix system only knows this is a zip file.
  • the META-INF directory
  • all xml files from META-INF (container, encryption, signatures, manifest, rights, metadata)
  • the section about font obfuscation

Here is what I added to the specification:

  • the requirement for a file named entry.html, which contains the Primary Entry Page of the Web Publication, located at the root of the package.
  • the requirement for a file named manifest.jsonld, which contains the WP Manifest, located at the root of the package.
  • is the entry page required?
  • is the manifest file required, as a WP can embed it?
  • the way relative IRIs work from the manifest and entry page.

Several items will have to be discussed:

  • font obfuscation: see issue 28
  • signature: it would be useful to enable signing the different components of the package -> see issue 31
  • if audio publications and visual narratives will be disambiguated via a different file extension (this is possible via a simple modification of the last section). Apple is used to that and this is really useful.

And later we'll have to tackle multiple renditions: this is something visual narratives will put back on the table.

I kept the term OCF-lite as file name only because this is the nickname we employ until now: it will be changed after we choose a proper name for this specification. But we should do such bikeshedding later.

Note: I hesitated introducing the ISO 21320 spec, but decided not to do so in this draft. I spotted two main items to discuss in this spec:
1/ the section about prohibited characters (B.3) only prohibits characters prohibited in the IRI spec and only highlights that OCF prohibits other characters . In the OCF-lite draft I have kept the OCF section associated with allowed characters as-is.
2/ the Table 1 states that the digital signature specified in ZIP is not allowed. there is no mention of this in OCF or this new draft: it may be good to add it.

@GarthConboy
Copy link

I generally like @llemeurfr 's direction above. Though, I'd be happy to retain the mimetype file (with a new value).

@iherman
Copy link
Member

iherman commented Jan 8, 2019

This issue was discussed in a meeting.

  • No actions or resolutions
View the transcript Audio TF discussion
Wendy Reid: The audio task force got together to have a follow up from the last meeting to get ducks in a row about packaging. To summarize what we discussed: The main focus is OCF-lite. Nothing decided, but OCF-Lite is the forerunner.
… Whatever we did with OCF-lite, we would keep it separate from the EPUB one, we would use EPUB 3.2 as the basis, but we want it separate to reduce confusion. it’ll be only for audio (could be for WP, more decisions to be made) but it’s being used as a basis, but not related to epub in any way.
Wendy Reid: https://github.com/w3c/wpub/wiki/Packaging-for-WP-Audiobooks
Wendy Reid: As Ivan mentioned, as it’s not in our charter to create a packaging spec, so we have to be cautious. As a group, we did expand the packaging for audiobooks wiki page a little bit. The one thing we did - we added requirements & success criteria.
… The group could help provide a few more criteria for success and requirements we need to meet to see where the potential options line up with our needs. We need to define success and requirements. One thing that came up - do we proceed in audio only or make a WP-wide solution…
… Our requirements for audio maybe incompatible with other WP packaging, but that needs to be discussed. Laurent…?
Tzviya Siegman: OCF-lite draft -> https://cdn.staticaly.com/gh/w3c/pwpub/95b98d4f09bd3ce107f6fb52f6edc391a3240408/spec/ocf-lite.html
Laurent Le Meur: The document I drafted is a simplification of OCF 3.2… I took the spec as-is and removed everything that was only tied to epub. I supressed meta-inf directory, all XML files from the meta-inf (container, etc). I also removed the mimetype file.
… The mimetype signature file comes from ODF and it could be ready by some systems but it’s not, and most reading systems don’t use it, and it’s more difficult to put inside, so we removed it. I also removed font obsfucation.
… I did add a file entry.html and the JSON LD file. I located both at the root of the package. I just also did a small specification on the way relative URIs work from the manifest on the entry page. It’s the equivalent of what’s in the OCF spec. It’s compatible with OCF. Tied to web publications and not epub files.
Ivan Herman: I very small tech question - in OCF we had stuff that there was 1 file that shouldn’t be compressed… (the Mimetype file) That’s gone, correct?
Laurent Le Meur: Yes - that was the mimetype file, it’s gone from this spec. The processor, which wants to open the zip can know more than it’s a zip file, but in the real world, no one was using this, so we felt it wasn’t necessary.
Dave Cramer: Quickly on the last point. I have found reading systems that won’t open an EPUB if you change the capitalization of mimetype. OCF without mimetype and container then just becomes a zip file with strict filenames and only certain compression types. So it limits things you can use within zip. The current OCF-lite draft also forbids embedded manifests, since the manifest must be a separate JSON LD file.
… Copying from OCF right now is complicated. I’ve been reviewing it in detail from the EPUB3 perspective - and I think there are lots of ambiguities.
Tzviya Siegman: Building on what Dave just said - I’m wondering what the benefit of writing OCF-lite vs ZIP.
Laurent Le Meur: Zip is an application note - not a standard. It has many options that no one wants to use or no one uses much. Like encryption, splitting, and there are strictly no constraints on file naming, so what has been done in OCF and what has been replicated, is putting constraints on ZIP to make it easier for authors and reading systems to read/write.
George Kerscher: Is there any requirement on the order of files within the zip in the container?
Laurent Le Meur: Not any more.
Brady Duga: I did want to beat a dead horse and point out that the mimetype magic number is used, today, in production code that a large portion of the world uses. We don’t have to include it, but we shouldn’t decide not to use it because it isn’t used. it is, just not by reading systems.
Garth Conboy: I was going to say something similar. I don’t find the mimetype burdensome — I think what you’ve done Laurent is great — removing meta-inf, certainly for audiobooks, but I can see keeping the mimetype and changing content for audiobooks may make sense. The last comment is maybe targeted best at Ivan — there might be more testability from 3.2 So maybe not draft something that looks like a packaging spec, and just point to 3.2 and note the ch[CUT]
Ivan Herman: For other reasons we looked at the charter again, and the charter does not forbid us to do a spec in this area, it’s not the charter that forbids us, but the comments we got at chartering that were very much against it. It is still to be checked, I would leave this issue open for the time. What makes it clearer for the user and the reader to handle.
Dave Cramer: Doing something in markdown and explaining the concepts might get us better feedback from various communities
Tzviya Siegman: +1 dauwhe
Wendy Reid: OCF-lite still remains to be one of the options we do have. We are still exploring other options as listed in the wiki. Another big one we’re hoping for an explainer in is HEIF. We’re hoping to get that in a few weeks as a counterpoint.
Ivan Herman: With OCF-lite and the others, My view of this is that what Laurent did is not only for audiobooks. We’ve had these discussions that yes, we’ll focus on audiobooks, but whatever we do would be, should be, maybe reusable for a manga format within a year or so…
… OCF-lite has this generality which is good while we are waiting for something better, but going down anything audio-specific puts us in a corner that I would not like.
Garth Conboy: +1 Ivan!
Laurent Le Meur: I would add that going through the OCF-lite is also compatible with OCF, so a standard book would be compatible with EPUB3 and the WP serialization. If you have both EPUB3 files AND the manifest in the same container…
… and you have one set of XHTML files, then you’ve got something that is compliant with EPUB3, web publications, and possibly even epub2. So with one container you have all.
Tzviya Siegman: My hesitation with OCF/OCF-lite, while it is compatible, what happens when we look at the stuff going forward. What happens on a website? How does this unpackage on a website? We are supposed to be working on web publications…
… the format we’re hoping for is web packaging (which doesn’t exist yet) but we have to have this as a consideration before we decide on OCF-lite. Even if it has a layer on top of it (javascript overlay…)
Laurent Le Meur: I put some comments in the document — they are not used for web publications on the web — it’s used for packaging. Our charter notes that we have to deal with packaged web publications. It’s not about packaging something on the Web, it’s about packaging something that will be on the web soon. So it’s about logically moving a package that will be on the web.
… Then you need middleware. We have streamers… Yes, you need software set aside that can dynamically create a document from the package.
Garth Conboy: I view this as interesting in the package world, but I view tangential functionality in EPUB. This effort could get some traction in the short run. Reading systems now can inject epubs - so loading an audiobook package this way is not a huge step…
… In the future if we get back to WP proper, maybe enough time will have passed that web packaging is real…
Luc Audrain: +1 to Garth
Ivan Herman: I must say I share the problems of Tzviya. I understand them. The way I see is slightly different from Laurent — we are forced into a corner. From the Web point of view, it would be much nicer if we had a web-packaging.
… E.g., we would expect a bunch of security issues are taken care of, but that’s not there. We need a solution right now for which OCF-lite cuts it perfectly. What will happen in 2 years from now, because I don’t expect it earlier…
… in 2 years, when Web Packaging becomes a Rec, we can just say it’s an alternative packaging format for audio or anything else. Maybe we don’t have to, but the user community will vote with their feet. They will have 2 possibilities…
… we will see which one we will alter, but we cannot wait 2 years. I must admit that I go into this with big reservations as well. On the other hand, that’s where we are.
Dave Cramer: OCF does not solve the problems we need to solve for web publications. EPUB is running into difficulties because we don’t have a clear solution for origin, for example. We also have lots of non-web use-cases and OCF-lite and some other packaging mechanism that doesn’t involved browsers seems useful.
… Audiobooks which, is just a bunch of MP3, or comics that are just JPEGs, doesn’t require complex tools. We may need different mechanisms for different use cases. I suspect a single mechanism won’t solve all situations. What google is doing is an overkill for some situations…
… good luck telling an audio book publisher/author that they have to create HTTP response/requests - that’s not going to work for everyone. We have different use cases based on the types of content.
Garth Conboy: +1 too
Laurent Le Meur: +1 to dauwhe
Ivan Herman: that is actually a very good point, dauwhe
Geoff Jukes: +1 to dauwhe
Laurent Le Meur: Ivan, you said that web packaging we have to wait 2 years. More than that. Once the spec is there, it’ll take time to get into browsers, then in authoring software, etc. With HTTP security, it could take up to 4 years…
Dave Cramer: It’s already partially implemented in Chrome.
Tzviya Siegman: Wendy mentioned in her email, or at the beginning, that we might consider doing something different. The more I learn about audiobooks, I’m thinking it’s smart. Especially with Geoff’s feedback, I’m wondering if we need to consider something slightly different for audiobooks…
… or make the specification so stripped down that audiobooks and the details are really audiobook specific…
Garth Conboy: +1 Tzviya
Tzviya Siegman: instead of arguing to what is best for web publications, we clean up the web publication, and make it primarily about the data in the web publication and the audiobooks. It’s possible that is the focus…
Wendy Reid: +1 tzviya
Bill Kasdorf: +1 to Tzviya. ODRL took the same strategy: very stripped down spec designed to be augmented for specific uses (e.g., news)
Ivan Herman: Just reacting on the last thing — I would prefer not to go to this extreme. I would prefer one packaging format. The reason why I was on the queue is that the points Dave made are good and important. Dave you may want to talk to the editor of the explainer…
Dave Cramer: Tzviya said something about web publication spec may be focusing on the manifest… I might go a little further there we’ve all experienced some difficulties in the group over the last years reaching consensus on our vision and what needs to be done…
… it feels like there is a core of stuff that is required by everything we’re talking about. Every sort of publication from audiobooks to web pubs to digital sequential art depends on a bit of info - having an ordered list (reading order)…
… that reading order can be expressed in many different serializations - epub has the spine, we have a JSON serialization in readium. Blackstone’s audio format has YAML data structure… Maybe we can have this model for structural metadata that can be serialized in different ways and packaged in different ways…
… which might get us out of the trap of User Interface issues. If we have some really simple data model, we can go experiment and build things - because we don’t know what’s coming after that…
Laurent Le Meur: +1 to dauwhe, building on a conceptual model is key
Nick Ruffilo: I have recently decided simple is always better, having worked with EPUB on all sides, spec is there and people will do what they want
… what is important is that the simple and basic stuff works, and things did and didn’t work in more complicated areas, if we keep it simple, accessibility is handled and a bulk of things are handled, and handle the exceptions from there
… decide on our own that exceptions are handled as needed by the industry
George Kerscher: On the accessibility side - i’m highly in favor of simplicity as well. The JSON file and the HTML file should not preclude the ability to provide a text transcript. As soon as we get into textbooks and such, it’s necessary. As long as we don’t preclude it. The same goes for comics/manga…
… as long as we can have descriptions of the images, that’s good
Tzviya Siegman: RABIT HOLE. I would like to help bring back to the agenda. We’re still looking at packaging in general. It sounds like lots of people like OCF-LITE (Zip with restrictions?) but there’s some hesitation for Web Publications, but people want to ideally use the same package for all WPs…
… and the big concern is that - it comes back to the question of if there will be middleware…
Tzviya Siegman: Maybe we want to start making the success criteria for packaging.
Laurent Le Meur: About accessibility - we will be ready to address the accessibility by having multiple renditions. Yes it will be good for audiobooks and comics as well - to have a full transcript of the content. It’s to be studied…
… Tzviya you were saying there is an issue with having a packing format - having the same for all different types. the epub file format won’t be replaced by this new container because epub works - it’s only for new kinds of publications - audio and comics. Because epub doesn’t work there - which is why we need something else…
Luc Audrain: +1 for new kind of publication
Laurent Le Meur: If we consider that, we create a new container format. We don’t say it will or won’t be used for other - we know it won’t be used for that. Then there won’t be any more friction.
Ivan Herman: The reason why I would be a little bit hesitant about having an audiobook format is the answer to George’s question - with web publication and the OCF-Lite, it’s a clear yes. It’s not problem to add text information to the package, because it’s a resource just like an audio…
… OCF-Lite handles that the way Laurent has handled that. As soon as we begin to lose the generality, and say ‘lets have an audio-specific format’ it might bite us later with the same question George asked.
Nick Ruffilo: If OCF-lite is a simpler format, and would work with the content of an EPUB, and I am a future publisher creating my content, why would i not want to package my content in OCF and have it work anywhere.
… What does EPUB have that is better? There might be cases where the full OCF is needed, but if we are going to get traction and implementors do OCF-lite, if it’s good enough for 90% and accessible, does this become the default and OCF becomes edge cases
Garth Conboy: It raises an interesting point. We can’t replace OCF with OCF-lite today, because EPUB is XML based, and we have to find the container - and all these things Laurent proposed removed. If there’s an EPUB 3.5 effort, and we get nice traction with audiobooks, back in IDPF days ‘browser friendly format’…
… if we do something like that - a half-step for EPUB, I can see OCF-lite - we have reasonable agreement that it’s OK for Audio. We should go there, and learn, and if that comes back to an EPUB 3.5 - more power to us but not sure it’s a bridge we can burn yet.
Wendy Reid: w3c/wpub#390
Wendy Reid: One thing I’ve done is that we’re learning towards OCF-Lite, but I’ve created an issue that is packaging for web publications and success criteria. Lets check our boxes and define what we need for Audio, WP, comics, etc.
… Lets define success, lets define what requirements need to be met. Security, Accessibility, and understand fully the decision we’re making. It could be OCF-Lite, but lets do the due diligence.
Ivan Herman: Agreed. 1) Somebody should come up with an initial list. The worst thing is to have a blank page. 2) Let us give a time limit on this. I would say 2 weeks. In those 2 weeks, we get what we get and then in 2 weeks, we process it.
Luc Audrain: +1 to time limit !
Wendy Reid: I will add my initial list to the first comment. Then I will work from there. I will also add the time limit to the issue.
Garth Conboy: I would like to constrain us on Audiobooks and possibly comics as well. We’re not looking at a packaging for WP - we don’t have a constituency for WP and PWP. If, longer term, web packaging might be something, lets not get off in the weeds if we look there..

@iherman
Copy link
Member

iherman commented Jan 29, 2019

This issue was discussed in a meeting.

  • RESOLVED: PWG will adopt a light-weight version of Zip, based on <a href="https://www.iso.org/standard/60101.html,">https://www.iso.org/standard/60101.html,</a> with some restrictions and additions for WP
View the transcript packaging
Tzviya Siegman: w3c/wpub#390
Tzviya Siegman: The next item is packaging. Wendy did a nice summary — but since the last meeting we’ve defined success criteria. Here is the issue with success criteria. Those who attended the AB publishing meeting, we got to see how things stand with packaging.
… we got info about the HEIF format — and Dave did a presentation on OCF-lite. We heard from the Chrome team about the current incubating format that google proposed. All that said, we have the success criteria put out.
… We are all but agreed agreed on OCF-Lite — but that’s a bad term for it because it’s possibly misleading…
Dave Cramer: I have a question about what we are specifying — it came up in a packaging thread. There is an ISO standard that basically is very close to being a subset of ZIP that OCF uses. It mentions restricting the types of compression — it has an appendix on restricting filenames…
… it’s also oddly, for an ISO standard, you can get the PDF without paying a bunch of Swiss franks. I think it would be good if we didn’t copy a whole bunch of text from OCF, but borrowed things that work like folder locations…
Tzviya Siegman: Dave — so you’re in favor of using a restricted format of ZIP instead of rewriting a spec?
Dave Cramer: I’m worried that we have an OCF spec, and we could be normatively adding more items. If we’re saying we want to use ZIP the ISO zip has some nice items in there.
Dave Cramer: https://www.iso.org/standard/60101.html
Dave Cramer: which is ISO/IEC 21320-1:2015
Garth Conboy: I think we could probably make — I don’t disagree with Dave — I think we could make a resolution to do something like OCF-Lite. Whether built up from the ISO Zip or down from OCF — I think we’re in the same neighborhood.
… One decision is about creating compatibility with meta-inf. We’re in the neighborhood that we want to use something zip based with minimal restrictions. We can probably resolve that now and there is some work to do if it’s build up or build down.
Nick Ruffilo: .. but we could resolve if it will work for audiobooks, but maybe other profiles for different uses of WP…
Ivan Herman: +1 to Garth
Laurent Le Meur: I’m not against the ISO standard as a base, but there are several items to discuss. the ISO 21320 spec only prohibits some characters — the OCF original prohibits other characters — so it’s not 100% compatible. In the ISO standard, there is a table that says that digital signatures is not allowed…
… but is not described in the OCF. It won’t be enough for a packaging mechanism, we need to specify what the name of the manifest file is — so we have some details of packaging that needs to be described that won’t be in the ISO spec. This is why in the draft I made before, I had chosen to copy the OCF spec and remove things.
… even if I make reference to the ISO spec, most of the language that is there will still be there.
Ivan Herman: First of all — accepting what Laurent had said — it will put our document/work on a more solid basis if it was based on ISO spec. It should be a starting position. Whether we want to be compatible with OCF on characters — we can go through that…
… The starting position should be the ISO — it will help in talking with TAG and others. The original reason I was on the queue is that our position goes — whatever we define here is not to define specifically and exclusively for audiobooks.
… whatever we do, we should try to be as generic as possible. If tomorrow, another profile comes up that is similar to audiobooks — manga came up — the starting position should be that the document Laurent creates is a lightweight format for publishing in general, which happens to be used by audiobooks.
Nick Ruffilo: +1
Tzviya Siegman: +1 to ISO
Laurent Le Meur: +1 to a generic packaging format for WP
Bill Kasdorf: +1 to Ivan
Wolfgang Schindler: +1 to Ivan — generic packaging format for WP
Garth Conboy: I would agree with that. I didn’t mean to imply it was only audiobooks, but that there could be more formats in the future. If we have to dream up another name, we can. We can get to what Laurent drafted by building up from the ISO spec…
… we clearly will have to have a well-known place for the manifest. I hope that we can say we are building a lightweight zip format based upon the ISO spec with additional restrictions and rules for WP
Laurent Le Meur: If agreed, I can modify the draft to reflect that…
Proposed resolution: PWG will adopt a light-weight version of Zip, based on https://www.iso.org/standard/60101.html, with some restrictions and additions for WP (Tzviya Siegman)
Tzviya Siegman: I think we should talk about if we need a stand-alone document
Ivan Herman: There are things we need to describe somewhere, so we need to have a document for this. Where you find the manifest, etc. It may only be one page but it must be there.
Ivan Herman: +1 to the proposal
Garth Conboy: +1 to proposal (too)
Wolfgang Schindler: +1 to the proposal
Dave Cramer: Just thinking about Tzviya’s comment about what we need for a document. OCF is 2 specs — there is an OCF abstract container, and then the zip container. The later includes all the ISO stuff, the former is ‘everything has to be the same folder and you have to put a container.xml here’
… not sure how we move the abstract container stuff into our spec…
Laurent Le Meur: +1 to PWP…
Laurent Le Meur: PPWP for Pragmatic PWP
Luc Audrain: +1
Tim Cole: +1
Tzviya Siegman: we do have a document — that is a shell — PWP — packaged web publications — that way we get around writing a package document. We have the information associated with packaging — anything else — but Matt might get annoyed if we change the short title.
Ivan Herman: +1 to dauwhe
Dave Cramer: My concern about putting this under PWP — will this make it feel like we’re rejecting all other attempts at the web for doing packaging?
Garth Conboy: When I typed PWP I almost typed “profile 1” but I am sympathetic to Dave’s comment. We can be clear that “this is what we’re doing now, we hope to use it in the future, but it’s not a hard decision that there won’t be another format in the future”
Ivan Herman: That’s why there was an email where we set up the limits and milestones for the upcoming year as a “lightweight packaging format”. There might be a heavyweight coming in the future, but I agree with Dave. We should be careful. I’m cautious using PWP.
… We have talked too much about it being all the solutions to the miseries of the world, so coming up with this will lead to ugly discussions.
Tzviya Siegman: +1
Laurent Le Meur: +1 to the vote
Tzviya Siegman: the document we’re talking about is very short — Light Weight Packaging Format.
Joshua Pyle: +1
Bill Kasdorf: +1
Wolfgang Schindler: +1
Mateus Teixeira: +1 to proposal
Nick Ruffilo: +1
George Kerscher: +1
Gregorio Pellegrino: +1
Resolution #3: PWG will adopt a light-weight version of Zip, based on https://www.iso.org/standard/60101.html, with some restrictions and additions for WP
Tzviya Siegman: Moving on — what are our next steps?
Laurent Le Meur: I have to modify the draft, remove everything about the characters. Replace that will reference to the ISO zip format. Rename what is PWP inside this document — we can use something like LWPF until we choose a final name. I can keep it as ocf-lite but we can rename it to something else when we chose the name..
… no one will see PWP, but next week would be good to chose a final name.
Ivan Herman: Great! A question we will have to decide is whether this document is on a Rec track or not. The ISO part makes this much easier. That means the only thing we have to test as part of the CR procedures are the ones we add — not the packaging/unpackaging. Which is very helpful.
… We will have to decide if it’s rec track or not. Personally I think it should be a rec track document.
Tzviya Siegman: I think referencing the ISO document makes it much shorter. The only things needed in our document are adjustments — so it becomes very slim.
Laurent Le Meur: but there’s no mention of font obfuscation, but still there is an issue where we have to discuss that…
Tzviya Siegman: For now, leave it out, we can add it later…
Dave Cramer: as someone who has spent too much time with the OCF spec, I hope we can write something clear about what is expected from authors and user-agents. OCF is a bit loose about what is happening with packaging
Laurent Le Meur: as a first step for next week, we should discuss which kind of template or reading system behaviour. There are many ways to specify the user agents, so I would like some input from the group on which kinds of writing we should put.
Tzviya Siegman: There will be a placeholder in the explainer — and we can work on getting the document drafted in the next few weeks

@llemeurfr
Copy link
Contributor Author

llemeurfr commented Feb 3, 2019

The group wanted a lightweight specification: here it is, the new commit makes it really barebone. It says "this is based on ISO 21320, there must be certain files in there, the files are referenced via IRIs, the media-type is ..., the extension is ...n ".

I anticipated here the resolution of:

If the consensus on these issues is not the one I suspect, I'll change the wording in the PR.

I didn't see what could be written as UA conformance requirements. This is to be discussed by the group.

I replaced any mention of PWP by LPF (Lightweight Packaging Format) and used Web Publication Lightweight Package (alias Package) for defining the zip container enhanced with the PEP and WPM.

I let ocf-lite.html as a file name for the moment. I'll change that after the name of the spec has been finally agreed on.

@iherman
Copy link
Member

iherman commented Feb 4, 2019

Here is a URL that can be used to see the HTML content:

https://raw.githack.com/w3c/pwpub/b3fe9881043c659c019067a640d78e96ed4faf71/spec/ocf-lite.html

However, for some reasons, respec is pending, ie, one cannot see the final version of the content:-(

@iherman
Copy link
Member

iherman commented Feb 4, 2019

I have two minor comments (the user agent conformance is another issue, to be discussed separately):

If both entry.html and manifest.jsonld are present in the package, the former MUST contain a reference to the latter.

I would add something like "...following the rules described by the definition of the PEP."

Person & email address to contact for further information:

I think the rules are that this should be a person, not a general mailing list. Ie, it should be me.

@iherman
Copy link
Member

iherman commented Feb 12, 2019

This issue was discussed in a meeting.

  • RESOLVED: Restructure the document to reflect the publication structure as primary, with web publications and packaged web publications as modules {: #resolution2 .resolution}
  • RESOLVED: WP keeps PEP as a requirement, Lightweight Packaging will give the option of using the PEP or the Manifest (with rules agreed to resolve any possible duplication [start with looking for PEP, and process that first; if not present, look for standalone manifest]), but one must be present in the package. {: #resolution3 .resolution}
  • RESOLVED: Laurent will merge the pull request as soon as he can {: #resolution4 .resolution}
View the transcript PEP in a package
Wendy Reid: #33
Wendy Reid: First topic today is final topic from last week: issue 33 from the Packaged Web Publication repo, about primary entry page becoming optional…
… a quick recap, and clearing up some questions. The main proposal right now is Ivan’s…
… PWPs may give you the option to make index.html or a JSON manifest the primary entry page…
… an alternate proposal was brought up on GitHub, the so-called ‘minimal’ PWP. The index.html only exists to point to the manifest
… this is specific to PWPs – just for the package context, not for web publications as a whole. Does anyone have comments?
Ivan Herman: Matt came with a proposal which I personally consider complimentary to the previous proposal, but some people might disagree
… but I would prefer Matt to make the proposal
Matt Garrish: In the discussion around primary entry pages and whether it’s required, we may be overlapping too much with audiobooks and web publications, expecting audiobooks to always be web publications…
… what I proposed was separating the manifest so the primary entry page doesn’t always have to be present with a packaged audiobook/epub… but it can be
… if the publisher wants it to be a conformant PWP, the publisher can include the entry page
… Ivan posted a better clarification of this morning this morning, which is that the packaged formats are somewhat separate from a WP… the package format doesn’t always have to be a WP
… everything becomes compatible. In its packaged form, the audiobook can be valid. It’s essentially making everything more abstract…
… getting out of the mess of how something remains logically consistent, even if these other formats don’t want all these other WP requirements to be present
Wendy Reid: #33 (comment)
Ivan Herman: I want to re-emphasize: everything that Matt said is obviously true, but one important thing is missing: a reading system MUST be able to understand the packaged version of full web publications…
… must be able to understand the primary entry, find the manifest out of that, so that a WP in the traditional sense, by just zipping it, even if I called the manifest file something else (although the index file must be kept) it’s still valid
Laurent Le Meur: I totally agree with Matt’s point. This is conceptual. Nevertheless, when we describe the spec, it will be very complex to express that conceptual model and at the same time to explain what Ivan said: a reading system must be able to understand everything with a primary entry page
… I propose we follow what Ivan suggested before: stating the primary page OR the manifest: one or the other…
Ivan Herman: +1 to laurent_
Laurent Le Meur: which for processing is easy to understand. It doesn’t mean we don’t have to explain this model, but I’d be careful not to make the concept too complex…
Wendy Reid: We’re back to the original proposal. The resolution would be: a PWP may include either the primary entry page or manifest but must contain one of those two
Ivan Herman: I think the proposal has two equally important parts.
Wendy Reid: So there are two resolutions
Proposed resolution: Restructure the document to reflect the publication structure as primary, with web publications and packaged web publications as modules (Wendy Reid)
Ivan Herman: -> Matt’s description for the proposal: #33 (comment)
dkaplan31: +1
Tzviya Siegman: +1
Ivan Herman: +1
Geoff Jukes: +1
Joshua Pyle: +1
Rachel Comerford: +1
Ric Wright: +1
Franco Alvarado: +1
Mateus Teixeira: +1
Simon Collinson: +1
Ben Schroeter: +1
Luc Audrain: +1
Bill Kasdorf: +1
Gregorio Pellegrino: +1
Wendy Reid: Resolution accepted
Resolution #2: Restructure the document to reflect the publication structure as primary, with web publications and packaged web publications as modules {: #resolution2 .resolution}
Proposed resolution: WP keeps PEP as a requirement, PWP will give the option of using the PEP or the Manifest, but one must be present in the package (Wendy Reid)
Ivan Herman: -> Ivan’s consensus proposal: #33 (comment)
dkaplan31: -1
Laurent Le Meur: +1
Ivan Herman: +1
Garth Conboy: How does this fit in with what Laurent just said about or/both?
Wendy Reid: OR/BOTH would work here, but at least one has to be present
Deborah Kaplan: I’m -1 unless it becomes EXACTLY one must be present
… one, but only one, must be present…
… from my experience of working with small producers, people will be confused, which means publications will be wrong, which means reading systems will behave inconsistently…
… creators will have to come up with workaround due to inconsistent implementation…
… the option of having two will end up with badness.
Ivan Herman: The resolution which I proposed said that the processor MUST look for a primary entry page and if it finds it, MUST process according to the rules. If it doesn’t find it, it looks for a manifest
Deborah Kaplan: As long as it’s 100% clear to creators and reading systems what will happen, that’s fine
Laurent Le Meur: I was very clear about that
dkaplan31: in that case I am changing my vote to a +0 from a -1
Wendy Reid: Would it be better to rephrase the resolution as either or but at least one must be present?
George Kerscher: As someone producing a publication, I’m going to start with my manifest. For an audiobook, I zip that up and distribute it to various places and they process it…
… if I want to add a primary entry page then I could serve it up on the web and all is well. To my mind, I’m progressively enhancing the publication
Ivan Herman: That workflow is correct
Matt Garrish: My question is one of consequences. When we require specific names, it’s going to mean that if you unpackage it on the web, you can only do this with one WP in a directory due to collisions
… are we putting a limitation that we got away from earlier back into play – that you can’t have multiple articles in one directory?
Deborah Kaplan: +0 and not +1 because I still dislike giving people choices, because small creators are confused by choices, while meanwhile large publishers can create a PEP trivially as part of production workflow whether they need one or not. But not -1 as long as clear flow is documented.
Ivan Herman: Matt is right. If we don’t have a name restriction, we have to do something to the package itself to find where to start. This is zip, we aren’t having web packaging, so I’m not sure what else we can do
Matt Garrish: It’s a circular problem: if we don’t have specific names, how do you find what you’re looking for - if you have something else finding the names, how do you prevent those from colliding?…
… trying to prevent the index.html problem from re-occuring, but I’m not sure how much of an issue it is…
Tzviya Siegman: This makes me uncomfortable too – it’s something we’ve always tried to avoid doing. In the world of scholarly publishing, if I have a journal of 30 articles, each published on their own, each will have this problem…
… I feel like this is going to come back to bite us…
Benjamin Young: My question was similar: if we have specified names for these things and a tree of inheritance where down a certain road you have index.html and down another road you don’t…
… is it possible to make a web audiobook in that world, or do they no longer intermingle?…
Charles LaPierre: Thinking about a journal made up of multiple article, wouldn’t each article be its own subdirectory, hence no collisions?
Ivan Herman: This whole filing thing reflects that what we’re using is a packaging format that isn’t Web friendly. And we know that, which is why we consider the current format as a lightweight temporary solution…
… I’d be happy if we had today a format which allowed me to refer to a URL for every file, and maybe we’ll have one before I retire. But we’ve agreed to define a lightweight packaging format now, and we have to live with it… we don’t really have a choice
… we could require a specific way of zipping which puts the file first in the zip file, which makes the publication more complicated, because I can’t just take a directory and zip it… this isn’t the solution…
Wendy Reid: We have to find a compromise
Ivan Herman: We have to accept the deficiencies of the system right now
Garth Conboy: I agree with Ivan. We dislike the alternatives more – anything that makes it harder is a no-no…
… there’s a manifest.json and index.html which are both magic names…
… the actual manifest can be standalone or included in the PEP…
… what are the changes that we propose to ensure there is no possible duplication?
Ivan Herman: What I’ve proposed: the first step the reading system does is locate the PEP. If it finds that, then it follows the processing steps that are described in the WPUB document…
… at first look at your own file, otherwise look for a Manifest file and that’s your manifest…
Matt Garrish: We’re making our WP format dependent on the packaging… I can live with this, but what if a better packaging format comes along in future?…
… would we drop these restrictions?
Ivan Herman: If we find a packaging format that allowed that, then yes
Laurent Le Meur: In future, this packaging will be used by publishers as a booster for leaving earth. When the publication is exposed on the Web, pure web packaging becomes important then
Benjamin Young: A general question: are we open to analyzing other formats for web archiving and distribution, the primary component being that they keep URLs around, or continue with zip?
Wendy Reid: If you recall a few weeks ago, we did open up the request for analysis of the different potential formats. They were analyzed based on the pros and cons of that table. If we missed anything in that table, it’s good to know about…
… we made the decision based on the ≈7 formats we looked at in that table
Ivan Herman: Let’s not reopen closed issues. For now we’ve decided to go with what we have, knowing that eventually the committee will produce a webby packaging format
… we explicitly said that if and when that happens, then this working group or its successor will look at it and consider it…
… but we need something today if we want to produce anything before the end of the life of the working group, less than 18 months from now
Proposed resolution: WP keeps PEP as a requirement, PWP will give the option of using the PEP or the Manifest (with rules agreed to resolve any possible duplication [start with looking for PEP, and process that first; if not present look for standalone manifest]), but one must be present in the package. (Garth Conboy)
Ivan Herman: we decided on something and shouldn’t reopen today
Bill Kasdorf: Quick question: if we’re seeing that not all packaged audiobooks are web publications, then we shouldn’t call them packaged web publications, right?
Laurent Le Meur: in fact we don’t.
Bill Kasdorf: Then they aren’t really PWPs, then
Proposed resolution: (Less typos version) WP keeps PEP as a requirement, Lightweight Packaging will give the option of using the PEP or the Manifest (with rules agreed to resolve any possible duplication [start with looking for PEP, and process that first; if not present, look for standalone manifest]), but one must be present in the package. (Garth Conboy)
Ivan Herman: +1
Garth Conboy: +1
Charles LaPierre: +1
Tzviya Siegman: +1
Matt Garrish: +1
Laurent Le Meur: +1
Rachel Comerford: +1
Ben Schroeter: +1
Joshua Pyle: +1
Bill Kasdorf: +1
Tim Cole: +1
Geoff Jukes: +1
Luc Audrain: +1
George Kerscher: +1
Mateus Teixeira: +1
Gregorio Pellegrino: +1
Resolution #3: WP keeps PEP as a requirement, Lightweight Packaging will give the option of using the PEP or the Manifest (with rules agreed to resolve any possible duplication [start with looking for PEP, and process that first; if not present, look for standalone manifest]), but one must be present in the package. {: #resolution3 .resolution}
Wendy Reid: We’ve made a decision, with 20min to spare… moving on to our next issue…
Ivan Herman: I propose that we merge the two requests from Laurent whenever he feels comfortable…
… reading that document via the pull request is a pain, and it’s better if we merge it in
Laurent Le Meur: I prepared the merge last week. We can work from that
Wendy Reid: If no opposition, we’ll merge as soon as Laurent is ready
Resolution #4: Laurent will merge the pull request as soon as he can {: #resolution4 .resolution}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants