Ralph Harris

Majors: Computer Science and French

University Honors with Honors in Computer Science

Supervisor: Dr. Curry I. Guinn

 

THE TURBOCHARGE SYSTEM: TRANSLATION USING RULEBASED OPERATIONS COMBINED WITH HIT-RATE ANALYSIS

RESULTS FROM GOOGLE ENGINE

 

Machine translation from one natural language to another has proved to be a difficult problem because of the ambiguities inherent in natural languages as well as the complexity of the morphological and syntactic transformations necessary for well formed languages in verb conjugation. Verb conjugation requires that the verb agree with the subject both in number and person. In addition, the different tense (past, present, future, etc.) of the verb often requires additional morphological and syntactic changes. These changes are not consistent across languages, even in languages in the same family like the Romance languages. One of the goals of this research project is to develop a multi-lingual database that encodes the rules for the necessary morphological and syntactic changes that occur in verb conjugation. The fields in this database should be identical for all languages included in this study: English, French, Italian, and Spanish. Thus, the algorithms used to do verb conjugation will be identical in all languages – the only parameters will be database entries. Translating verbs (and other parts of speech) from language to language cannot use a simple word-for-word translation using database lookup. Such translations are usually inadequate because this method often cannot choose which word to prefer in the target language when there are multiple possibilities. Correctly choosing the most appropriate word often involves understanding the surrounding

context of the word. The state-of-the-art technology in representing vast amounts of real-world knowledge is insufficient to resolve this problem. Statistical analysis of large volumes of text can help in resolving which word is most frequently used in a particular context. The TURBOCHARGE system translates verbs from one language to another by using rule-based operations to determine the root form of the verb, database lookup to determine the statistical analysis of a large corpus to determine the most likely translation in context. Specifically, the TURBOCHARGE system uses the hit count returned by Google for partial sentences in the target language. For evaluation, the translations of potentially ambiguous sentences by the TURBOCHARGE system are contrasted with those produced by popular online translation systems AltaVista’s Babel Fish and Google Translate. These results indicate that TURBOCHARGE performs 32% better than Babel Fish in selecting the most appropriate word choice, and 40% better than Google Translate in verb conjugation.