Ralph Harris
Majors: Computer Science and French
University Honors with Honors in
Computer Science
Supervisor: Dr. Curry I. Guinn
THE
TURBOCHARGE SYSTEM: TRANSLATION USING RULEBASED OPERATIONS COMBINED WITH
HIT-RATE ANALYSIS
RESULTS
FROM GOOGLE ENGINE
Machine
translation from one natural language to another has proved to be a difficult problem
because of the ambiguities inherent in natural languages as well as the complexity
of the morphological and syntactic transformations necessary for well formed
languages in verb conjugation. Verb conjugation requires that the verb agree
with the subject both in number and person. In addition, the different tense (past,
present, future, etc.) of the verb often requires additional morphological and syntactic
changes. These changes are not consistent across languages, even in languages in
the same family like the Romance languages. One of the goals of this research
project is to develop a multi-lingual database that encodes the rules for the necessary
morphological and syntactic changes that occur in verb conjugation. The fields
in this database should be identical for all languages included in this study: English,
French, Italian, and Spanish. Thus, the algorithms used to do verb conjugation will
be identical in all languages – the only parameters will be database entries. Translating
verbs (and other parts of speech) from language to language cannot use a simple
word-for-word translation using database lookup. Such translations are usually
inadequate because this method often cannot choose which word to prefer in the
target language when there are multiple possibilities. Correctly choosing the
most appropriate word often involves understanding the surrounding
context of the word. The state-of-the-art technology in representing vast amounts of real-world knowledge is insufficient to resolve this problem. Statistical analysis of large volumes of text can help in resolving which word is most frequently used in a particular context. The TURBOCHARGE system translates verbs from one language to another by using rule-based operations to determine the root form of the verb, database lookup to determine the statistical analysis of a large corpus to determine the most likely translation in context. Specifically, the TURBOCHARGE system uses the hit count returned by Google for partial sentences in the target language. For evaluation, the translations of potentially ambiguous sentences by the TURBOCHARGE system are contrasted with those produced by popular online translation systems AltaVista’s Babel Fish and Google Translate. These results indicate that TURBOCHARGE performs 32% better than Babel Fish in selecting the most appropriate word choice, and 40% better than Google Translate in verb conjugation.