A Tool for Visually Handicapped People

Oku4.0

Presentation

Abstract:
The project involves the design and implementation of a tool for visually handicapped people speaking Turkish. Using this tool, visually handicapped people will be able to read, edit and print a document, browse in the internet, send and receive email messages. Also, as part of this project a text-to-speech (TTS) engine suitable for the Turkish language will be designed and developed.

Introduction:
With the decrease in the cost of personal computers, a need for software tools for the visually handicapped people that would enable them to read, edit and print documents has arisen. We have realized that such handicapped people are not able establish a private communication with other people, since they are not able to read and/or write hand written or typed documents. Also with the decrease of the cost of accessing the Internet, the demand for accessing the Internet among the visually handicapped people has been increasing drastically. They would like to communicate with other people through email messages. The demand from the visually handicapped Turkish speaking people has been the motivation for the current project.

Previous Work:
Having realized the need for such a tool, we have done some experiments under the name Oku (it stands for read in Turkish). The first versions, Oku.1 and Oku.2 were only editors. They used an experimental TTS engine. In that engine we have relied on the phonetic characteristics of Turkish language. In Turkish language the words are composed of syllables. A Turkish word can be broken down into a sequence of syllables using a simple algorithm. The pronunciation of a syllable is the same, independent of the word it occurs. Making use of this fact, the TTS engine breaks a word into its syllables, and then plays the sound files from its database. The database contained about one thousand syllables.

We made these programs publicly available, free of charge, through a web page. We also sent copies on CDROMs to those who did not have internet access. The interest in the program encouraged us to continue on this tool. A large number of the users asked for an extension to the program that would allow them to surf on the Internet. Considering the feedback from the users, we have developed the last version, called Oku.3. It included a web browser and an email client, as additions to all the features of the previous versions. An important difference of Oku.3 is that it uses Mbrola as the TTS, which has been developed outside of our group. It is a Multilingual Speech Synthesizer. It is free for non-commercial applications. We have the Mbrola TTS as a DLL, an incorporated it into our system. The Mbrola performed slightly better than the TTS used in the previous versions of Oku. The last version, Oku.3.0.1, is available, again free of charge, from http://www.cs.bilkent.edu.tr/~guvenir/Oku/. With the addition of the Internet browser, the Oku.3 program received a lot of attention from the community of visually handicapped people in Turkey. They gave us very important feedback and asked for many improvements in the program. However, we did not have any chance to incorporate these suggestions into the Oku.3 program, due to our limited resources.

Current Project:
Taking into considerations of the community of visually handicapped people in Turkey, we re-design the Oku program. In the current project, we aim to develop a professional quality tool. The web browser of Oku.3 is an experimental tool that cannot handle many features of web pages, including forms. It is an important deficiency, because without forms, the user cannot use the search tools, e.g., google, and yahoo. Also in Oku.3 the email servers such as hotmail are not accessible. In the proposed project all features of the web pages, except pictures, will be handled.

The most important work in the proposed project is the design and implementation a new TTS system suitable for the Turkish language. The Mbrola, that we have used in Oku.3 is a speech synthesizer based on the concatenation of diphones. It takes a list of phonemes as input, together with prosodic information (duration of phonemes and a piecewise linear description of pitch), and produces speech samples on 16 bits (linear), at the sampling frequency of the diphone database used (it is therefore NOT a Text-To-Speech (TTS) synthesizer, since it does not accept raw text as input). The TTS system to be designed and developed in this project will accept raw text as input and produce high quality speech samples. We investigate new techniques for speech synthesis suitable for the Turkish language. All other feedback from the users will be incorporated in the new version of the Oku program. They include improvements in the handling of dates, times, numbers, abbreviations and units.

Results:
This version of the project will be called Oku4. The results of this project will be made available to the public, freely and without restrictions, through this web site (http://www.cs.bilkent.edu.tr/~guvenir/Oku4).

Principal Investigator: H. Altay Guvenir, Ph.D.
Investigator: Engin Demir, MSc.
Investigator: Celal Ziftci, Firat Kart, Orhan Uctepe.

Duration: June 2003 - June 2004.

Sponsor: Microsoft Research Ltd.

Contract No: 2003-239