eZ Tika 1.2 released ( binary file indexing)

eZ Tika 1.2, based on Apache Tika 0.6-dev, carries a lot of under the hood improvements and is capable of converting most binary file types to plain text for sub-sequent indexing by any search plugin, most notably eZ Find.

eZ Tika is of importance in general, but if your eZ Publish site has a lot of binary files ("attachments") or is geared towards DMS cpabilities, eZ Tika is a must! It is capable of converting any modern "office" file type for subsequent indexing, inclusing the latest MS and Open Office file types.

There is one caveat left for general use: indexing of pdf files with asian and sometimes cyrillic characters. This is best done using xpdf. For the latter though, eZ Tika provides a dedicated wrapper script and settings file you can use almost out of the box.

Comments

Log in or create a user account to comment.

Article info

0 comments