Machine translation

Machine translation - the process of translation of texts (written, and ideally oral) from one natural language to another using a special computer program. Direction of research which is related whith the construction of such systems is also called “Machine translation”.

Forms of organization and human computer interaction while machine translation:

With post-editing: a source text is converted by machine, and man-editor corrects the result.

With pre-editing: a man adapts text for machine converting (eliminates the possible reading ambiguous, simplifies and marks text), followed by program processing.
With inter-editing: a man interferes in the translation system, solving difficult cases.

Mixed systems (for example, along with pre-and post-editing).

For the realisation of machine translation, a special computer software installed which implements the translation algorithm. Translation algorithm is defined as a sequence of unique and well-defined action on the text to find the translation equivalents in a given pair of languages ​​L1 - L2 for a given direction of translation (from one specific language to another).

The machine translation system includes:


1.
Bilingual dictionaries, with the necessary grammatical information (morphological, syntactic and semantic) to ensure the transfer of equivalent, variant and transformational translation equivalents;

2. Algorithmic approaches to grammatical analysis, implementing some of the adopted text for automatic processing of formal grammars. There are also individual machine translation system designed for translation in three or more languages​​.

Algorithm of the process of machine translation:


1. At the first stage, the text entry and search for input wordforms (words in a particular grammatical form, such as dative plural) in the input dictionary (dictionary the language from which the translation is realised) with concomitant morphological analysis, in which the word forms belonging to a particular lexeme (dictionary word as a unit) is defining. The information relating to the other levels of the language system organization can also be obtained while the analysis of the words forms.

2. The next stage includes:
A) translation of idiomatic phrases, phraseological unities or cliche in this particular area (for example, while English-Russian translation phrases like in case of, in accordance with have a single digital equivalent and are excluded from further grammatical analysis);
B) the definition of basic grammatical (morphological, syntactic, semantic and lexical) characteristics of the source text elements (for example, nouns, verb, syntactic features of word forms in the text, etc.), produced within the framework of the source language;
C) homography designation (conversion homonymy of word forms. For example, English word round can be a noun, adjective, adverb, verb, or preposition);

Alpha

D) lexical analysis and translation of lexemes. Usually at this stage the monosemantic words are separated from the polysemantic ones (with more than one translation equivalents in the target language). After that monosemantic words are translated according the list of equivalents. For the translation of polysemantic words so-called contextual dictionaries are used. Dictionary entries of such contextual dictionaries are algorithms for the context request in order to determine the presence or absence of contextual determinants of meaning.

3. Final grammatical analysis, in which necessary grammatical information defined based on the target language data (for example, with Russian nouns such as sleigh, scissors verbs should be in the plural form, while the original may be singular).

4. The synthesis of the target word forms and sentence as a whole in the target language.

Depending on the characteristics of morphology, syntax and semantics of a particular language, as well as the translation direction, general translation algorithm may include other steps, and also modifications of these stages, or their order, but this kind of variation in modern systems are usually small.

Analysis and synthesis can be made either per phase, or for the whole text which is input into the computer's memory, in which case the translation algorithm provides a determination of so-called anaphoric links (for example, such link as pronoun with its replaceable noun – e.g. pronoun them with the word pronouns just in this explanation in brackets).

The current machine translation systems oriented to the specific language pair (for example, French and Russian or Japanese and English). Generally, these systems use the translation equivalents either at a perfunctory level, or at an intermediate level between the source and target language.


The quality of machine translation depends on:
1) volume of the dictionary,
2) information volume imputable/traceable to lexical items,
3) careful compilation and verification of analysis and synthesis algorithms work,
4) the efficiency of the software.

Modern hardware and software allow the use of large volume dictionaries containing detailed grammatical references. Information can be presented either in declarative (descriptive) or in procedural (taking into account the needs of the algorithm) form.

Machine translation should be distinguished from the computer-aided translation. The latter refers to the automatic dictionary, which helps people to choose the right translation equivalents quickly. Although in fact, in this and in another case, the computer is working together with a man (translator or editor). The content of the term "machine translation" is the idea that the main and most part of the work of translation and finding translation equivalents machine takes on itself, leaving only control and correction of errors for the man, while the computer-aided translation is only intermediate agent to find the translation equivalents quickly. However, in this case of such kind of dictionaries some of the features inherent to the machine translation system can be implemented.

Two main approaches to machine translation are in the practice of translation and information technology:

1. Machine translation results can be used for superficial familiarization with the contents of a document in a foreign language. In this case, it can be used as signal information and doesn’t require careful editing.

2. Use of machine translation instead of the usual "human."

This one requires careful editing and configuration of translation system to a specific subject area. Play a role The dictionary completeness, its focus on the content and range of linguistic resources of translated texts, the effectiveness of the way to resolve the lexical polysemy, the effectiveness of the grammatical information extraction algorithms, finding the translation equivalents and synthesis algorithms are important here. In practice, the translation of this type is economical effective, if the amount of translated texts is large enough (not less than tens of thousands of pages per year), if the texts are sufficiently homogeneous, dictionaries of the system are complete and allow future expansion, and the software is suitable for post-editing. This kind of machine translation systems are used in organizations whose requirements for fast and quality translations are great enough.

Machine translation system parameters must meet four requirements:


• efficiency;
• flexibility;
• speed;
• accuracy.


1. Efficiency of machine systems - is the possibility of constant vocabulary replenishment and the creation of new thematic dictionaries. In this parameter, they are far ahead of different dictionaries usual printed editions.
2. Flexibility - is the ability to:
1) "rough adjustment " to a specific subject area (specialized dictionaries are for this purpose), and
2) "sophisticated adjustment " for the specific text, book or group of documents (modified user dictionaries).
3. Speed ​​- the ability to enter automatically and text information processing from the paper.
4. Accuracy - stylistically and grammatically correct and adequate conveying the meaning of sousce text to the target text. This is the most "vulnerable" place of machine translation systems.


Понравилась статья? Добавь ее в закладку (CTRL+D) и не забудь поделиться с друзьями:  



double arrow
Сейчас читают про: