Option: Linguistic analysis module WebCANAL

Linguistic analysis and translation module

What does this component offer?

A conventional search often delivers a multitude of hits. But the amount of hits can veil the fact that a considerable number of other hits could have been obtained. This is due to the fact that the (morphological) variants of the search term are not “reached” by the conventional search.

Here the linguistic “knowledge” of WebCANAL comes into play, as it automatically generates the various forms of search terms and adds them to the search. These are the different forms of a word (inflected forms, plurals, case forms etc.), but also the different ways of writing a word (such as the new and old spelling in German, Swiss-German spelling, British and American English) or the written form of umlauts (ä becoming ae).

Example

For example, if "internationalen Verträge" is sought, WebCANAL will first recognise the basic forms of the words ("international" and "Vertrag"), and then generate the other forms:

it's for: international

  • internationale
  • internationalem
  • internationalen
  • internationaler
  • internationales

and for: Vertrag

  • Vertrags
  • Vertrages
  • Vertrage
  • Verträge
  • Verträgen

It is clear that this form of enhanced search delivers better hits. The problem of effective searching is increased when searching in multiple language data reserves, and this is generally the case in libraries.

Via the translation component of WebCANAL the terms of the original search are translated into different languages and here also the various forms of the translated term are generated and added to the search.

For the example "internationale Verträge", the following additional terms and forms result:

French:

  • accord + accords
  • contrat + contrats
  • convention + conventions
  • traité + traités
  • international + internationaux

English:

  • agreement + agreements
  • convention + conventions
  • treaty + treaties
  • international

It is clear that through this kind of widening of the search, results can be decidedly improved.

As well as German, English and French, WebCANAL also covers Italian, Spanish and Latin. Other languages (such as Dutch) are in preparation.

Technical information about WebCANAL

WebCANAL is based on very large (full form) dictionaries of the various languages and language pairs. These dictionaries range in size from a few thousand entries (for example the dictionary for the new German orthography, with ca. 6,000 entries) to almost a million entries (the German composite dictionary contains about 940,000 entries). The translation dictionaries contain between 20,000 and 130,000 entries. These dictionaries are compressed via a special process and can be held in the random access memory of the computer, whereby analysis and translation are particularly fast.

Alongside these dictionaries other computer processes are active, for example in the recognition of multiple word terms (e.g. French "pomme de terre"). Special dictionaries, dictionaries of synonyms, thesauri, and 'private' dictionaries can easily be added to the system.

WebCANAL, which is based on the EXTRACT linguistic server, is available for WINDOWS and LINUX, and is used in other varied applications, such as in the internet search engine Scoutmaster.

WebCANAL can be traced back to the project which was partly supported by the European Union within the framework of the ESPRIT and LIBRARIES programmes.

Literature

The bases of the linguistic analysis for German, as it is carried out in WebCANAL, is described in Stegentritt, Erwin (Ed.): German Analysis, Morpho-Syntax within the free-text retrieval project EMIR. (Linguistics – Computational Linguistics vol. 15). Saarbrücken 1993.

A comparison of a linguistically enhanced search with a conventional search, on the basis of 50 searches, can be found in German in: Stegentritt, Erwin: Evaluationsresultate des mehrsprachigen Suchsystems CANAL/LS. in ABI-Technik. Nr. 1, 1998. P. 38-46.

WebCANAL services

Every library has its own conditions and requirements. Special dictionaries or other services, tailored to the data inventory of the library and the wishes of the users, are offered in conjunction with WebCANAL.

Among these is the generation of additional, domain-specific dictionaries, the automatic generation of synonym lists and the augmentation of WebCANAL via additional functions, such as a phonetic search etc.

Content from WebCANAL

Synonyms and related terms

Alongside the previously mentioned standard dictionaries contained in WebCANAL, additional lexical information sources can also be integrated. Among them are for example a German dictionary of synonyms and related terms.

Thus the user can be offered a list of terms to choose from.

Example in German: LOHN

abgeltung, arbeitseinkommen, arbeitslohn, auszahlung, bezahlen, bezahlung, entgelt, entlohnung, gehalt, honorieren, verdienst, vergütung, zahlen, zahlung.

Thesaurus

Various thesauri are also offered with WEBCANAL. Here is an example from a German legal thesaurus.

VERTRAGSBRUCH:

Teil von VERTRAGRECHT
Verursacht: SCHADENERSATZANSPRUCH
ENGLISCH: BREACH OF CONTRACT
FRANZÖSISCH: RUPTURE DE CONTRACT

Example from WebCANAL with the word "law suite"

After clicking on the symbol for the analysis of terms, the user receives the following selection:

After the selection of the desired languages and once the button 'Analyse translation' has been clicked on, the following results list appears, from which the user selects the desired search terms:

With the help of the 'TRANSFER' button, all the search terms selected by the user will be transferred to WebOPAC as 'OR searches':

Terms transferred: Erlass, droit

LIBERO v6 at the 103. German Library Days in...

Congress Center, Hall 5, Stand 19 - 03. to 06. June 2014.

LIBERO Digital Library is launched

Exciting innovation

WiDGET 2.0 for LIBERO released

The demand for mobile services is constantly growing and with LIBERO libraries can meet their users' demand.