EXTRAKT - Linguistic Engine

EXTRAKT widens the scope of the search query, thereby enhancing the search results.

Linguistic Engine

A conventional search oftern delivers a large number of hits. This seemingly extensive list of search results often clouds the fact that a considerable number of additional hits could be relevant. This is because the (morphologic) variants of the search terms are often not found by conventional searches.

This is where the linguistic knowledge offered by EXTRAKT comes into play. EXTRAKT automatically finds the other variants of the given search term and adds them to the search. These include the different forms of a word (inflected forms of words - for example plural forms), but also different ways of spelling (British and American English, Swiss German spelling, German spelling reforms etc) and the use of umlauts and accents etc.


The search terms are "Vertrag" and "international". If the search is done without the linguistic component, both terms are searched unaltered and the variant forms of "international" and "Vertrag" are not considered, thus:

for: international

  • internationale
  • internationalem
  • internationalen
  • internationaler
  • internationales

and for: Vertrag

  • Vertrags
  • Vertrages
  • Vertrage
  • Verträge
  • Verträgen

This obviously leads to inaccurate results, which cannot satisfactorily be improved with truncation. With EXTRAKT different variant forms are automatically recognized and searched. Likewise German composite words are separated into their component parts, so that here too significantly better searching is possible. Simultaneous searches for the old and new German spelling styles can also be carried out. EXTRAKT improves searching with its multilingual component, which translates the search term into various languages.

The standard languages in FILERO are German, US-English and English. So in our example translations and their variant forms are searched:


  • accord + accords
  • contrat + contrats
  • convention + conventions
  • traité + traités
  • international + internationaux


  • agreement + agreements
  • convention + conventions
  • treaty + treaties
  • international

It is obvious that the ability to search synonyms broadens the scope of results yet further.

In addition to German, English and French, other languages available are Italian, Spanish and Latin.

Additional languages (for example Dutch) are envisaged in the short term.

Technical foundations of EXTRAKT

The foundations of EXTRAKT are extremely large (unabridged) dictionaries in several different langauges as well as bilingual dictionaries. The size of these dictionaries range from a few thousand entries (for example the dictionary for the new German spellings with ca. 6,000 entries) to nearly a million entries (the German composite dictionary contains around 940,000 entries). The bilingual dictionaries contain between 20,000 and 130,000 entries. These dictionaries are heavily compressed through a special process and can therefore be stored on the RAM of a computer. This means that analysis and translation are especially fast.

Alongside these dictionaries other computer-linguistic processes are used - for example the recognition of multiple word terms (e.g. the French "pomme de terre"). Special dictionaries, synonym dicitionaries, thesauri and "private" (customised) dictionaries can easily be added to the system.

EXTRAKT is based on the linguistic server EXTRAKT and is available for WINDOWS NT, WINDOWS 2000, XP and LINUX. It is already in use in several different applications - for example the internet search machine Scoutmaster.

EXTRAKT was developed in part with supported from the European Union as part of the ESPRIT and LIBRARIES programmes.


The basic principles of linguistic analysis for the German which is used in EXTRAKT are described in: Stegentritt, Erwin (Ed.): German Analysis, Morpho-Syntax within the free-text retrieval project EMIR. (Sprachwissenschaft - Computerlinguistik. Linguistics - Computational Linguistics vol. 15). Saarbrücken 1993.

You can find further information about EXTRAKT at www.textec.de.

Professional information management with FILERO...

FILERO, the leading web-based enterprise information management solution, is now offering an interface to MetaDirectory...

Systematic digitalization at Poultry Farm...

The digital transformation is in full progress and is leading to fundamental changes throughout the entire business...

FILERO – Time tracking with timeBuzzer

How simple it can be: buzz, work, buzz – tracking your working time is easy with the FILERO Enterprise Information...