Books in the Cloud: What is OPDS and Why Should I Care?

Books in the Cloud

One of the benefits of an e-book, over a traditional publication is that it is disconnected from the physical world. It has been virtualized. It is a bunch of bits that reside “somewhere” that you copy onto your reading device which converts it into a visual rendition of the text. The somewhere can be on your PC at home, on a laptop you carry with you, on a thumb drive in your pocket, or stored on a server that you can access through the internet. Once on the internet, the location of the book can no longer be expressed in longitude and latitude, or country and zip code. It is no longer in real space, it is in cyberspace! Since cyberspace evokes scenes from TRON or books by William Gibson, the world marketing minds of the Internet® decided to call it “the cloud.” To find something  like can someone write my paper for me in the cloud, you need its web address (URL) and then can you can access it where ever you are, and it can never be misplaced or forgotten on a nightstand.

OPDS

The Open Publishing Distribution System (OPDS) is a web service with which e-books can be shared with other connected systems. An OPDS client application can communicate with any server that implements this service. This is how an app on a mobile device, such as Stanza (iOS) or Aldiko (Android), can “get books” from bookstores or libraries.

While OPDS is format agnostic, the reader applications are usually not. As was mentioned in the article Understanding e-Book File Formats, the “universal” format for e-books is EPUB, and most OPDS enabled readers will only recognize e-books in this format. This forces kindle users to get all their books from Amazon. For most, this is not much of a hindrance since Amazon has an enormous selection, and the store is well integrated into the Kindle, but it blocks Kindle users from libraries and small publishing houses (such as WritelyDone).

OPDS in Action

As an example, I’ll go through the process of connecting to WritelyDone’s OPDS service using Stanza on an iPod Touch.

Once the Stanza application is running, select “Get Books” from the icons along the bottom. A list of per-configured book sources appears.

Selecting the “Shared” button from the top row of buttons opens a user configurable list of book sources. In my case, I already had Feedbooks set up, so it shows up in the list. Pressing the “Edit” button allows you to add or remove sources.

After pressing the green plus button to add a source, you can enter the URL of the OPDS catalog you are trying to add. Enter that data and press “Save” and then “Done.” The WritelyDone catalog is now added to the list.

Selecting WritelyDone as a book source opens a screen where you can select from the available content on WritelyDone. The “Bookshelf” option, will contain any of the titles you have added to your bookshelf on WritelyDone.com, you will be required to log in to view your bookshelf. Enter your WritelyDone user name and password when prompted.

Selecting a book from your shelf, opens a page with a description and a “Download” button in the upper right. After pressing “Download” you will be prompted to confirm the action, and then the download will take place.

Different sources

The process for adding book sources is similar in all readers. All you need to know is the location of the OPDS catalog and you can add a source. Here are some OPDS catalogs to try out

Understanding e-Book File Formats

One of the first things that confronts a writer wishing to e-publish their work is the confusing array of file formats and meta-data and the seeming lack of any standardization. In this article I will explore what an e-book really is, information regarding the different formats, tools to convert between formats, and how minor changes made as you write will make things much easier when you are ready to publish.

Why can’t I just save my doc file as an e-book?

When I was young, most writers used a typewriter to create a manuscript. There is some nostalgia surrounding typewriter, the snapping of the keys, the sound of the bell for each finished line; this was the sound of progress being made. When people were using typewriters, content was separated from format, layout, and design. You wrote a manuscript, double spaced, with whatever typeface was in your typewriter, typically in 10 or 12 point font. If you wanted to indicate special formatting, you would add hand-drawn markup to the document, or use some simple character based markup such as asterisks to indicate *bold* or underscore for _underline_and a slash for /italics/. At the publishing house, they would also add markup to the hard-copy, indicating margins, fonts, page-breaks, vertical spacing, table layout, images etc.. Authors worried about content, and publishers, for the most part, handled the presentation.

Nowadays, almost everyone uses a word processor of one type or another, with Microsoft’s Word being used by the majority. Publishers still want manuscripts in the same format (double spaced, 1 inch margins, etc.), but with the advent of word-processing, the markup can be embedded in the file. So when you italicize a phrase, as I did in the previous sentence, there is a code embedded in the text stream marking the start and end of the italic text. When viewed or printed, the phrase is shown in italics. This is known as presentational markup, and is what is used most often on word-processors. With presentational markup, you can change type family, size, weight, style, decorations, etc. You can go nuts and dO crazy things. This allows writers to make bold, large titles, and chapter headings, or put the telepathic robot conversation in some odd font / style / weight to differentiate it from normal dialog.

The problem with presentational markup is that it is often used where descriptive or semantic markup should be used. Semantic markup differs from presentational markup in that it labels the individual parts of the document, such as the title, a paragraph, an image caption, or a heading, without defining presentation. For instance, the title is distinguished from the rest of the text by surrounding it with the appropriate markup codes, or tags. In html the markup tags are human readable and indicated by surrounding the tag name with angle brackets <tag> to open an element and including a trailing slash to close the element <tag /> as follows.

<title>This is a Title</title>

With semantic markup, the presentation is defined elsewhere, either in a separate file (known as a style sheet), or at the beginning of the document. In this way, equivalent parts of the document will have the same styling throughout. It is therefore easy to define and change the styling of every piece of the document that shares the markup, such as paragraphs, or chapter headers. It also becomes easy to generate a table of contents for a book by creating links to each chapter heading. This is important because the concept of a page is generally no longer meaningful due to variations in reading device sizes and capabilities.

This discussion of markup is necessary because all e-book formats require the document to have some sort of semantic markup. If you are self-publishing or want to understand why you can’t simply publish your MS-Word doc file as an e-book, you need to understand a little of what’s going on inside the e-book files themselves. The “e-book” is a container that supplies the document text, styling information, cover art, and meta-data to the reading application or hardware. A *.doc file is a document with hardly any semantic markup, containing mostly proprietary presentational markup.

File formats: the big 3

There are three major e-book formats that are supported on the majority of reading systems: ePUB, MOBI, and PDF. EPUB is an open format defined by the International Digital Publishing Forum (<idpf>), it is the primary format used on the iPad, Sony Reader, and the Barnes & Nobel NOOK, and can be read by any PC or internet based e-book reading software (eg. Calibre, Stanza, Bookworm, Ibis). Basically, all e-readers except the Kindle can read ePUB files without fuss.

MOBI, the Mobipocket reader file format now owned by Amazon, can have the *.azw, *.prc, or *.mobi extention. AZW is Amazon’s version of the mobi format that can be read on the Kindle. It is essentially the mobi file structure with its own DRM scheme, and no javascript support. The Kindle can also read unprotected *.prc or *.mobi files directly. MOBI is technically an off-shoot of the ePUB format and shares many of the same conventions.

PDF isn’t really an e-book format at all, it is a document format based on PostScript (PS). PDF is useful when you need to keep the “page” concept, and positioning on the page is important. It is also supports scalable vector graphics, so it is good for rendering technical drawings and diagrams. This really isn’t a good format for e-readers, most will read them, but it often requires horizontal panning which is no fun. It is useful if you need the e-reader version to match the printed version, or you need scalable graphics and mathematical formatting.

There are two other formats worth mentioning at this point, plain text (*.txt) and HTML (*.html, *.htm). Plain text has the advantage that it is readable on all e-readers. There are several formatting issues that need attention with respect to line wrapping, and there is no support for images, links or TOC, but for a simple document, it works well. HTML is important because not only is it the basis for web display, it is the underlying format for both ePUB and MOBI! Plain old HTML files can be viewed by the majority of e-readers without modification.

There are many other e-book formats, but with ePUB and MOBI, you have a book that can be read without conversion on any device currently available. There are methods for reading ePUB on the Kindle utilizing the built in browser, but they rely on an active internet connection. It is simple enough to convert an ePUB to the MOBI format, that really there is no reason to not publish in both formats. Publishing only in PDF should be avoided where possible since it is a fixed width format, and is poorly supported on many devices.

Format File extensions Devices that CAN read Devices that CAN NOT read
Text *.txt All None
HTML, XHTML *.htm, *.html, *.xhtml Kindle, iOS, Android, Nook Sony, iREX, Kobo
ePUB *.epub Android, iOS, Nook, iRex, Sony, Kobo Kindle
Mobi-pocket *.mobi, *.prc, *.azw Kindle, Android, iOS, iRex Nook, Kobo, Sony
PDF *.pdf All None

All except the Kindle 1.0, and WISEreader. Reading experience varies greatly, typically much worse than other eBook formats.

E-book files: What’s inside?

The two e-book formats discussed above, mobi and ePUB are really collections of files that are rolled into a single file for distribution. EPUB files are actually standard *.zip archives and can be opened by changing the file name extension to “zip” or by using 7-zip software which can open ePUB files without changing the extension. Mobi files use a proprietary compression scheme, but are essentially the same concept, so I’ll limit the remaining discussion to ePUB.
Notice how the document is a bunch of html files instead of one. Each html file is a section or a chapter and each section terminates on a page-break. You normally don’t want text from chapter 2 to reflow into the bottom of the last page of chapter 1. You want to have it start on a new page. Putting each chapter into a separate file forces the e-reader to do this.

Writing for the Web

Since the two big formats are containers for HTML documents, it makes sense to keep that in mind while you are writing. Converting your document to HTML might take a lot of effort if you aren’t planning for the conversion as you write. For instance, if you write an entire novel in one file, separating chapters by inserting page-breaks, and typing chapter headings by changing the font size 24pt and making it bold, you will likely spend a lot of time trying to get your HTML to look right in the ebook.

Here’s some tips to make it easier:

  1. Create a separate file for each chapter as you write, or at least make it easy to find chapter boundaries (ie. Include “Chapter” in the heading, or some other indication, don’t rely on style alone). This also makes it easy to “give away” a sample chapter by posting it on the internet.
  2. Use styles instead of formats. In other words, use the built in title, heading 1, text body, caption, etc. styles. If you don’t know how already, learn how to modify the styles to adjust the display, not format each piece of text by changing it’s typeface, font size, color, etc.
  3. Try to save and edit the document as HTML. You can switch from document view to source view using Word or Open Office. This allows you to edit in the WYSIWYG editor, or edit the HTML directly.
  4. If you need to be able to have the manuscript as a single DOC file (to send to a publisher for instance), you can use “master documents” to combine the front matter, chapter files, index and whatever else you have into one big file at the end. This is a feature in most major word-processors and makes it easier to work on large documents in general.

Putting it all together

Once you know what’s in there, making the e-book is straightforward. First convert your manuscript into a set of HTML files, one for each chapter. Then add the content.opf , toc.ncx and mimetype and then zip all the files up into a single *.zip file. Change the “zip” extension to “epub” and you’re done. The challenge is in creating the ocf and ncx files. Luckily there are tools to help do this.

The most popular free tool for creating and converting ebook files is Calibre. Calibre can convert your manuscript into an epub file with little effort. The ocf and ncx files get created automatically. However, there are a lot of options, and any wrong setting could lead to an incomplete book with missing meta-data, cover image problems, or a broken TOC. Some of these problems may only become apparent when you try to convert the file to another format, such as MOBI.

Sigil is currently the only tool that allows you to create and edit ePUB files directly, without writing in one format and then converting. It is free, open source software and can work well once you are more comfortable with the ePUB format.

If you use OpenOffice or LibreOffice, you can use an Open Office extension that lets you save your manuscript directly as an ePUB. I chose to link directly to the file since the site is in Italian. If you want to see the original site and give kudos to the author, you can find it at http://lukesblog.it/ebooks/ebook-tools/writer2epub/ where there is information on installation and use. Google translate does a good job with the translation.

Adobe InDesign is another option, but only if you have $700 to spare. Adobe now offers a monthly subscription license for $35/month if that better fits in your budget.

Once you have a well-formatted, publication quality ePUB, you can convert it to MOBI using Calibre, or use an online service such as 2epub.com or Convert.Files. Both services use the Calibre engine to do the conversion, but 2epub prompts for some metadata and overall is a better experience in my opinion. In order to create a Kindle e-book for distribution on Amazon, you need to convert the epub to the AZW format using their KindleGen application.

If you plan to have a publication quality document, you must make sure you do a thorough quality check by using an ePUB validator in addition to viewing the document in a reading application or better yet, on the target device. If you are not very tech savvy, you may want to delegate this e-book creation and conversion to a professional. It is my hope that you can find the help you need here on WritelyDone.

New Logo and Site Update

WritelyDone is updating some of its branding, inluding the logo and site design. The new logo   was designed by Miranda Myrhre and is a much stronger mark. The “Bringing readers and writers together” tag line is not changing, nor is the basic functionality of the site.

I plan to roll out a major theme update in the next few weeks. This will include the new logo, along with bringing in some color (*gasp*) to complete the branding and professional layout to the site. I have also been working on improving the usability of certain features as well as improving the experience for touch screen users. These changes will be incorporated into that release as well.

A new experience: Reading a book made of paper

Two things happened a couple of months ago related to one of my favorite book series, George R. R. Martin’s “A Song of Ice and Fire.” HBO’s excellent rendition of book one, “A Game of Thrones” finished it’s first season, and the long awaited fifth book, “A Dance With Dragons” was released. Since I had the Kindle version of it on pre-order, it appeared on my Kindle one day as if by magic. I was very excited to start reading right away, but I couldn’t remember what was going at that point in the series. It took so long for the fifth book to be written, I had completely forgotten the thousands of pages I had previously read. I decided to read books 2-4 again.

Book 2, “A Clash of Kings,” was published in paperback back in 2000 by Bantam Books. So first I had to find the book. I’m pretty sure we have two copies since my wife and I both read these before we were living in the same house. We love to read, so our bookshelves are jammed with books, two rows deep. I’m forever pulling out books to search behind them. There are books in stacks, books in boxes, books on top of the shelves where no one can reach, and a couple books scattered through the house. I searched and searched and was just about to give up when I found it. I immediately found a comfy place to sit with good light and started to read.

After reading a couple of pages, it dawned on me that this was the first “paper” book I had read in over a year. Since reading a physical book was a “fresh” experience for me, I thought I’d write about it.

A couple of things I have already mentioned. A physical book takes up space. A large number of books take up a lot of space. Anyone who reads and has moved to a new house will also tell you that they are heavy. Then there is the problem of locating the book, unless you are extremely organized and catalog your library, finding anything becomes a quest in itself. Of course, along with the frustration, this quest does yield a sense of joy when the book is finally found.

Upon opening a book, you are greeted with one thing an e-book will never give you–the smell. Books have a very unique odor that gets associated, at a primal level, with the sense of wonder and enjoyment you experience while reading. I love that smell. There’s also the soft whisper of page turning and the tactile feel of the pages themselves. Books also make a satisfying thump when closed. There a definitely things to love about traditional books.

When I read, I tend to read for hours at a time; this is when I started wishing I had this on my Kindle. The font is too small for my old eyes. Increasing the font size is a simple matter on an e-reader, but an impossibility for ink on paper. Of course, I could get one of those magnifying things, but I find them clumsy at best, and I often like to read “one-handed.” When holding a book by your fingers for hours, it can also get heavy, especially when reading a thick tome or a hardcover. The Kindle does not need to be held open and can be set down or propped up.

A normal lunch for me consists of food and reading. I was having some left over pepperoni pizza while reading “A Clash of Kings.” Since I have to hold the book, and turn pages, I was constantly having to mop my greasy fingers off on a napkin. Even trying to be tidy, the pages ended up discolored with the orange grease unique to pepperoni. There’s also the big splotch where the bit fell off while taking a bite, instinctively I caught it with the book. Over the course of several days, I spilled coffee, wine, and all manner of crumbs into the pages of this book. Am I a slob? I suppose so, but with my Kindle I just set it on the table and can read while using both hands to eat and can turn the page with my pinky which generally stays clean.

Ever drop a book and lose your place? I did the other day. Ever do it with a book you’ve already read? Flipping through the book, everything looks vaguely familiar. My Kindle remembers my place. It even goes to the same spot when I open a book on my computer or laptop or iPod. Of course, I’ve had problems with something pressing on the ‘turn page’ button on the Kindle and paging through all the way to the end of the book. It is far easier to flip though a paper book than an e-book to find your place if it has been lost. It is just less likely you’ll lose your page in the first place.

My background is in engineering, so I don’t have a huge vocabulary. I’ll come across words that I think I know the definition of, but I’m not 100% certain. It is so easy to use the dictionary look up feature on the kindle, that I’ll take the two seconds to find out for sure. I can’t even remember the last time I got up from reading a paper book to find a dictionary to look up a word. It was at least 20 years ago.

Besides the smell and the sounds of a traditional book, I find e-books superior in every way. I can find them easily, they take up no space, I can read with “no hands,” I seldom lose my place, and all the other reasons mentioned above. With the advent of e-ink readers with their reflective display technology, and the improvements in battery life, e-readers are finally delivering on the promise of e-books. Still there are people, like my wife, who will only buy paper books, but I think the transition will be similar to what’s happening with streaming video on demand. I have Netflix, and Hulu, and HBO Go; do I need to buy DVD’s or Blu-Ray disks? But for special movies, having that physical copy is more than a copy of the movie. It’s tied to a memory of watching that movie with friends and family. Books will go the same way. Everyone will have a shelf of paper books, with their special books “on display,” but most of their consumption will be done digitally. It’s just so much easier.