Saturday, December 08, 2007
Latex to OpenOffice/Word
Recently I had to convert a big Latex document full of maths to Office format. Copy-paste with all the formulas is not an option (more so, given the terrible state of equation editing features in Microsoft Office 2007). After a lot of searching around, I was able to find ways to automate a large chunk of the process by using LyX and Open Office.
- Import Latex document in Lyx.
- Export the document in Lyx to HTML. All the math formulas will be converted to images.
- Open the HTML document in Firefox.
- Start OpenOffice writer, and copy paste all HTML in a new document. Save the document (I used the doc format as I wanted MS Word compatibility). The office document should look similar to the Latex one with all the math equations and symbols as images. The images are referenced as hyperlinks, which means the images will no longer be displayed if you remove the HTML files or copy the file to some other machine. The images should therefore be embedded in the office document.
- To embed images in OpenOffice writer, open Edit Menu->Links. Select all images and click on Break Links. This will remove reference to all externally linked images.
- The document is now ready. Some manual editing will be required as
- Tables and footnotes were missing
- Images were missing (may be because I was using eps files)
- References to sections, figures, and equations were not always present
- Citations were hyperlinks to some random document
I was able to convert a 10 page latex document to a word document in an hour. The formatting was not very good, as all math equations were slightly vertically displaced and were underlined. But at least it was readable.
Just for reference, I am using Lyx version 1.5.2 with Open Office 2.3 on openSUSE 10.3.
@Feanor: look at MathML and Latex plugin for movable type and wordpress.
The annoying problem is *inline math* -- if you put MathType objects, the line spacing would be messed up (I've seen this often in papers). The proper way to do manual formatting (subscripts, superscripts, etc). It can get complicated when you have subscript-of-subscript (you have to lower the font and make it smaller), both-subscript-and- super-script" (use "Equation Fields"), and weird symbols (use the MathType font or in the worst case draw it with Visio).
The figures are time-consuming too. I use pstoedit, a plugin of GSView, to convert eps to emf (emf can be edit in Visio). If you use gnuplot for the experiments, set the output format directly to emf and create the figures again with a script.
Citations and figure/table references take time too. The proper way is to use "cross-reference", but if you don't expect to edit the Word document often, simply write the numbers by hand.
Overall, one hour for a 10-page paper is ... very fast!
I prepare the source document in OpenOffice and then export to whatever i like - latex export is really fine with the writer2latex tool.
After that i do minor or major manual editing to my latex file depending on what i want. Conversion to doc, html or whatever works too and i am happy.
The odf format is really amazing and in case something annoys you in your document you can even hand-edit the xml file and reassemble things back manually. Gorgeous stuf !!!!!!
It is really cool.