The original MOBI format (used on early Kindle readers) was developed by a French company that was bought by Amazon in 2005 to obtain rights to their format. Kindles can read unprotected MOBI files (with extensions .mobi or .prc) as well as their own files (with extension .amz) that differ only in the way digital rights are handled. In 2011 Amazon introduced the expanded KF8 format that is used by Kindle Fire tablets and newer Kindle readers. The EPUB format is based on a standard developed by the International Digital Publishing Forum (IDPF). This format is used by almost all of the eBook publishers other than Amazon, e.g., Apple, Barnes & Noble, and Sony. An EPUB document consists of the HTML files containing the content along with several organizational files. These files are compressed into a single zip file having the extension epub, e.g., MyBook.epub.
The basic underlying language for both the EPUB and MOBI formats is HTML, the language of the web. This is not surprising since an eBook reader, like a web browser, is not page oriented. The text is free flowing. The user can modify how much is displayed on a page by changing the font size, linespacing, margins, or words per line. HTML stands for Hyper Text Markup Language. It is a markup language not a programming language. It consists of tags placed in the document file that indicate how the document should be organized and displayed. The tags do not modify the content. The older Kindles only support a subset of the HTML language and many of the defaults are special. CSS (Cascading Style Sheets) can be used to control the visual appearance of the document. The older Kindles have limited support for CSS.
Since the basic underlying language is HTML, the obvious question is "How do I get my document into HTML?" Many authors, who are not familiar with HTML, construct their document in Microsoft Word and then save it as an HTML file. Word does the conversion to HTML. Unfortunately, the HTML produced contains a lot of extraneous code that often leads to unintended consequences. I prefer a different approach. The document can still be prepared in Microsoft Word if you like, but the conversion to HTML is handled differently. After some initial preparation, the text of the Word document is copied to the clipboard and then pasted into a text editor. This process removes the special formatting information that Microsoft Word adds. Word processors such as Microsoft Word add extra characters to the document file that are not visible on the screen. These extra hidden characters specify formatting information such as fonts, paragraphs, page breaks, and line spacing. These special characters need to be removed in order to have a valid HTML file. After the text is copied into the text editor the remainder of the editing will be done there. Documents produced by Text editors contain only text, i.e., there is no added formatting information. A good free text editor is notepad++. The next step is to add the HTML tags. The HTML tags could be added one by one, but this can be quite tedious. Instead we will make use of a powerful search and replace feature involving regular expressions. This will allow us to make multiple substitutions and deletions at the same time. The HTML tags will provide the required document structure and formatting information. Once the HTML has been added, we then construct the other informational files required by the EPUB standard. The collection of EPUB files are then zipped into a single compressed file. There are validation tools that can be used to insure that the EPUB document has been properly constructed. The validated EPUB document can then be converted to a MOBI document using the Kindlegen program or the Kindle Previewer, freely supplied by Amazon.
In the next section we will describe the basic structure of an HTML document and will define some of the HTML tags that are valid in both MOBI and EPUB documents.