Tux

...making Linux just a little more fun!

linux command to read .odt ?

J. Bakshi [j.bakshi at unlimitedmail.org]


Tue, 9 Jun 2009 21:11:52 +0530

Dear all,

Like catdoc ( to read .doc) is there any command to read .odt from command line ? did a lot googling but not found any such command like catdoc.

On the other hand I have found that .odt is actually stored in zip format. So I have executed unzip on a .odt and It successfully extracted a lot of files including "content.xml" which actually have the content :-)

Is there any tool which can extract the plain text from .xml ?

Please suggest.

The content.xml looks like

<office:document-content office:version="1.2">
<office:scripts/>
-
<office:font-face-decls>
<style:font-face style:name="Times New Roman" svg:font-family="'Times
New Roman'" style:font-family-generic="roman"
style:font-pitch="variable"/>
<style:font-face style:name="Arial" svg:font-family="Arial"
style:font-family-generic="swiss" style:font-pitch="variable"/>
<style:font-face style:name="Arial1" svg:font-family="Arial"
style:font-family-generic="system" style:font-pitch="variable"/>
</office:font-face-decls>
<office:automatic-styles/>
-
<office:body>
-
<office:text>
-
<text:sequence-decls>
<text:sequence-decl text:display-outline-level="0" text:name="Illustration"/>
<text:sequence-decl text:display-outline-level="0" text:name="Table"/>
<text:sequence-decl text:display-outline-level="0" text:name="Text"/>
<text:sequence-decl text:display-outline-level="0" text:name="Drawing"/>
</text:sequence-decls>
<text:p text:style-name="Standard">This is a test </text:p>
</office:text>
</office:body>
</office:document-content>

Note the content

<text:p text:style-name="Standard">This is a test </text:p>


Top    Back


Kapil Hari Paranjape [kapil at imsc.res.in]


Tue, 9 Jun 2009 21:45:42 +0530

Hello,

On Tue, 09 Jun 2009, J. Bakshi wrote:

> Like catdoc ( to read .doc) is there any command to read .odt from
> command line ? did a lot googling but not found any such command like
> catdoc.

Try unoconv.

Kapil. --


Top    Back


J. Bakshi [j.bakshi at unlimitedmail.org]


Tue, 9 Jun 2009 22:10:19 +0530

On Tue, 9 Jun 2009 21:45:42 +0530 Kapil Hari Paranjape <kapil@imsc.res.in> wrote:

> Hello,
> 
> On Tue, 09 Jun 2009, J. Bakshi wrote:
> > Like catdoc ( to read .doc) is there any command to read .odt from
> > command line ? did a lot googling but not found any such command
> > like catdoc.
> 
> Try unoconv.

Thanks but my requirement is little different. There is an indexer running on a remote server. the site is powered by typo3. There is already catdoc and its associated command tool installed to read .doc, .xls, .pdf etc and do the indexing. I need one more tool which can also read .odt. unoconv needs openoffice itself. my requirement can also be solved if there is any CLI tool which can simply extract the plain text content from .xml as I have already found that .odt saves the content in content.xml.


Top    Back


Francis Daly [francis at daoine.org]


Tue, 9 Jun 2009 18:09:15 +0100

On Tue, Jun 09, 2009 at 09:11:52PM +0530, J. Bakshi wrote:

Hi there,

> Like catdoc ( to read .doc) is there any command to read .odt from
> command line ? did a lot googling but not found any such command like
> catdoc.

I use the "o3read" suite wrapped in a script which does, essentially,

 unzip -p "$input" content.xml | o3totxt

When I google for the words

 read .odt from command line

the fifth link included a pointer to http://stosberg.net/odt2txt/, and claims that a Debian package is available.

 apt-get install odt2txt

gives me a new command which preserves more formatting than o3totxt did.

I may well switch to using that instead now.

I only tested these on older .odt files. Perhaps new ones work less well.

> Is there any tool which can extract the plain text from .xml ?

I usually use "xmlstarlet" for this; occasionally the input needs to be processed to help xmlstarlet understand it.

I use that for a limited rss reader, as well for a gmail notifier.

f

-- 
Francis Daly        francis@daoine.org


Top    Back


J. Bakshi [j.bakshi at unlimitedmail.org]


Tue, 9 Jun 2009 22:42:15 +0530

On Tue, 9 Jun 2009 18:09:15 +0100 Francis Daly <francis@daoine.org> wrote:

> On Tue, Jun 09, 2009 at 09:11:52PM +0530, J. Bakshi wrote:
> 
> Hi there,

Hello,

thanks for your solution. meantime I have found some thing great !!!

abiword --to=txt myfile.odt
 

and this will create myfile.txt Next step is just " cat myfile.txt"

with best wishes :-)


Top    Back