Text To Speech

Welcome

INDEX

Problem Statement

You just highlight/select the text you want to listen, press the Selected Text button on the Text Toolbar of firefox it will play for you.

Do you know that paper is 10 times easier to read than a computer screen? CoolSpeaking can read any text on your computer for you.

Just highlight the Text you want to hear, and then press the play button to hear it.

Typical Applications

Text to Speech Features

Recommended Hardware

System Requirements

Motivation

Information and communication technology is rapidly evolving as an effective tool for making information wide spread and available online to several communities. The industrial society is turning towards information society. The increased use of information technology is enabling people across the world to participate in the knowledge network; however visually impaired people in developing country like Mongolia are being deprived of the benefits of the computer system. One of the main reasons for this is lack of suitable human computer interface and the software designed and developed to meet local needs. To design and develop a computer interface for a person who can not see what computer displays, is the most challenging task for many software developers. In most of the developed countries like Japan they have many public projects and commercial software companies addressing to such issue. Many software companies in India are developing commercial software like content management system and financial software etc., however due to current market needs they do not recognize the needs of text to speech (TTS) converter. There is a great need to develop a text to speech converter tool with simple human computer interface in local language to meet needs of visually impaired people and to put foundation for side applications. The text to speech(TTS) conversion tool can effectively address needs of visually impaired people in India. On the other hand the leading causes of loosing sight are computer displays, TVs and video games.

Objective

General Objective

To make usage of PC’s more user friendly by developing text to speech synthesizer and to meet needs of visually impaired people in Mongolian language.

Specific Objective:

.

.

Initial Proposal and Achievemen

Break of the development is as follows:

.

.
Module Time
For Reading the Tutorial 30 Hrs
For Copy Paste Mechanizm 15 Hrs
For invoking java in Javascript 20 Hrs
For invoking shell script in javascript 10 Hrs

Project Methodology

The Development Methodology

The addon as a whole comprises of two subsystems; the Interface part and the Text to speech conversion engine. The interface part is the ordinary mozilla firefox window with some text selected. The conversion engine is “espeak” will take input in text format. The general architecture of the addon is shown below.

“eSpeak” (Speech Synthesizer)

eSpeak is a compact open source software speech synthesizer for English

It can run as a command line program to speak text from a file or from stdin.

Features of eSpeak:

•Includes different Voices, whose characteristics can be altered.

•Can produce speech output as a WAV file.

•SSML (Speech Synthesis Markup Language) is supported (not complete), and also HTML.

•Compact size. The program and its data, including many languages, totals about 1 Mbytes.

•Can translate text to phoneme codes, so it could be adapted as a front end for another speech synthesis engine.

•Potential for other languages. Several are included in varying stages of progress. Help from native speakers for these or other languages is welcomed.

•Development tools available for producing and tuning phoneme data.

•Written in C++.

Command Line options for eSpeak

espeak [options] ["text words"]

Text input can be taken either from a file, from a string in the command, or from stdin.

-f <text file>

Speaks a text file.

-a <integer>

Sets amplitude (volume) in a range of 0 to 200. The default is 100.

-p <integer>

Adjusts the pitch in a range of 0 to 99. The default is 50.

-s <integer>

Sets the speed in words-per-minute (approximate values for the default English voice, others may differ slightly). The default value is 170. I generally use a faster speed of 190. Range 80 to 390.

-h or --help

The first line of output gives the eSpeak version number.

-q

Quiet. No sound is generated. This may be useful with the -x option.

INSTALLATIONLinux and other Posix systems

There are two versions of the command line program. They both have the same command parameters (see below).

1.espeak uses speech engine in the libespeak shared library. The libespeak library must first be installed.

2.speak is a stand-alone version which includes its own copy of the speech engine.

Place the espeak or speak executable file in the command path, eg in /usr/local/bin

Place the "espeak-data" directory in /usr/share as /usr/share/espeak-data.

Dependencies

espeak uses the PortAudio sound library (version 18), so you will need to have the libportaudio0 library package installed. It may be already, since it's used by other software, such as OpenOffice.org and the Audacity sound editor.

The speak program may be compiled without using PortAudio, by removing the line

#define USE_PORTAUDIO

in the file speech.h.

Official Website of espeak eSpeak http://espeak.sourceforge.net/

Mozilla Firefox Add – OnTechnologies used to develop Firefox extensions

Firefox is largely built using four technologies:

1 XUL

2 CSS

3 JavaScript

4 XPCOM.

Mozilla

Extensions are also built using these four technologies.

XML:

Extensible Markup Language (XML) is a meta-language for expressing various kinds of data. It was specified in 1998 by W3C, the organization that sets standards for web-related technologies. It has a number of useful qualities: it is generic, extensible, and easy to validate as well-formed.

CSS: A style language to alter the display of XML documents

It is a style-description language defining the display of data marked up in XML and HTML. By separating the structure of the data, expressed through HTML or XML, and the display style, indicated by CSS, data can be reused better than it is when structural and stylistic markup are both embedded in HTML.There are three CSS specifications (Level 1 through Level 3), with progressively powerful features. The Gecko rendering engine handles nearly all of CSS Level 2 and some of CSS Level 3.

JavaScript

JavaScript is a prototype-based object-oriented language, and as shown in Listing 3, also permits independent class definitions. It does not have strict typing like Java, making it extremely flexible and giving it qualities that in some senses could be considered similar to Lisp.

hierarichy

XUL

XUL is an XML-based language, and was developed to be the GUI markup language for the Mozilla browser. There are earlier experiments going back a long way in developing user interfaces using a combination of HTML and scripting languages, and XUL could be considered an evolutionary step from that.

For more on XUL : https://developer.mozilla.org/En/Firefox_addons_developer_guide/Introduction_to_XUL%E2%80%94How_to_build_a_more_intuitive_UI

Using XPCOM

XPCOM is a framework for developing platform-independent components. Components developed in line with that framework are referred to as XPCOM components, and sometimes the components are simply referred to as XPCOMs.

It is mainly used here for creating and executing files.

For more on XPCOM:

https://developer.mozilla.org/En/Firefox_addons_developer_guide/Using_XPCOM%E2%80%94Implementing_advanced_processes

Implementation

Contents of the package:

Chrome

“Chrome”is the word used to describe all the GUI structural elements that go into an XUL application.

Three kinds of packages make up chrome

The content package

This package is used to contain the main XUL and JavaScript source files. Most extensions consist of a single content package

The locale package

This package is used to contain language data that can be translated. To make an extension’s GUI support multiple languages, you can include multiple locale packages, one for each language.

The skin package

This is used to include source files used as visual elements in the GUI, including style sheets and images. Most extensions include only one skin package, but you can include multiple skin packages to allow the GUI to change with different themes.

Architecture

architecture

Chrome URL

Use a file called a “chrome.manifest” to register chrome packages with Firefox and start using them. To register a package, you use a special URI scheme called a “Chrome URL” to represent the path to the file. Chrome URLs are structured as:

File name

Role

install.rdf Called the install manifest, this gives basic information about the extension, and is required in order for the extension to be installed in Firefox.
chrome.manifest This is the chrome manifest described in the earlier section. Registers packages and invokes cross-package overlays.
overlay.xul XUL file that will be overlaid on the Firefox browser window, adding buttons, menu items, etc.
speak.xul

speak.js

The XUL to display a clock in the window, and the JavaScript to control its operation (these files will be used in Phase 2).

folders of the package

Main components of the speak.js file

function Speak_speak()

Accessing file: I Created a XPCom components to handle the file input output. The basic idea is to create a temporary file in the tmp folder and using a shell script run the file.

for creating a temporary file Reference https://developer.mozilla.org/En/Firefox_addons_developer_guide/Using_XPCOM%e2%80%94Implementing_advanced_processes

espeak -f /tmp/Speak_temp.txt

Reference to do this: https://developer.mozilla.org/en/Java_in_Firefox_Extensions

function CopyToClipboard()

netscape.security.PrivilegeManager.enablePrivilege('UniversalXPConnect');

https://developer.mozilla.org/en/Using_the_Clipboard

http://www.mozilla.org/editor/midasdemo/securityprefs.html

For security: http://www.mozilla.org/editor/midasdemo/securityprefs.html

http://ntt.cc/2008/01/19/copy-paste-javascript-codes-ie-firefox-opera.html#more-33

function CreateFile()

For creating a file used XPCom component.for creating a temporary file Reference https://developer.mozilla.org/En/Firefox_addons_developer_guide/Using_XPCOM%e2%80%94Implementing_advanced_processes

function KeepQuiet()

This is to make the voice quiet this is nothing but accessing the shell script through java script

It will access the Speak_quiet.sh file which contains

espeak -q command

function Speakpitchinc() & Speakpitchdec()

To Increase and decrease the volume

It will access the Speak_pitch.sh file which contains

espeak -h command

function SpeakIncrease() & SpeakDecrease()

To Increase and decrease the volume.

It will access the Speak_Incr.sh file which contains It will pass a parameter and integer to the shell script which will be incremented/Decremented after every button click.

espeak -a <integer>

function SpeakSpeedinc() & SpeakSpeeddec()

To Increase and decrease the speed.

It will access the Speak_Incr.sh file which contains It will pass a parameter and integer to the shell script which will be incremented/Decremented after every button click.

espeak -s <integer>

Challenges

Initiall I thought that I have to access the clipboard for the selected text so I wasted time for accessing the clipboard of firefox which is locked due to security issues.

var Speak_focusedWindow = document.commandDispatcher.focusedWindow;

var Speak_txt = Speak_focusedWindow.getSelection().toString();

Reference to do this: https://developer.mozilla.org/en/Java_in_Firefox_Extensions

Reference to do this: http://forums.mozillazine.org/viewtopic.php?t=446245

Conclusions

www.cse.iitb.ac.in/~neelamadhavg

Source for Slide Show in XHTML: http://www.w3.org/Amaya/

.

.

.

.

.