| |
The PDF to MP3 project provides alternative access to academic
documents for the blind, weak readers and others. Using an MP3 player the
person listens to a synthesized reading of the paper. The audio is divided
into tracks and read in multiple voices to accommodate the listener’s navigation
and comprehension. The developed software performs the PDF to MP3 conversion
using the Python programming language and third party components.
Table of contents
Online version
PDF version (5/5/03)
Revised version of PDF (6/16/03)
Revised version of PDF (7/30/03)
Spring Break Update (ppt)
Final Presentation (ppt)
The demo demonstrates an audiotext version of:
Hesham M. Kamel and James A. Landay, Sketching Images
Eyes-free: A Grid-based Dynamic Drawing Tool for the Blind, Assets 2002.
The MP3 files can be heard using an MP3 player, for example, on a desktop
with Winamp or the Creative Muvo MP3 player.
The zipped version of the audio is available at: \\dorianm-cs\research\enable_tech\present
The prototype of the conversion system has been implemented
from existing software components; below are links to the software. The
Python programming language is used to combine the different components
of the system. The “pdftohtml” software extracts text and font settings
from a PDF and converts them into HTML. The Python HTML parser reads
in the HTML file, divides the paper’s components based on format, and stores
the text in a data structure. The text in the data structure is processed
by Microsoft's TTS engine using the male voice to emphasize keywords
and the female voice to read text. Microsoft Speech Application Programming
Interface (SAPI) also provides the functionality to save the spoken word
into WAV files. The WAV files are converted to MP3 files using LAME
software. The final audiotext is about 30-45 minutes long depending on
paper length and reading speed.

The prototype is limited to converting papers of a fixed format, in
which headings are the only bold text. The parts of the paper, such as
title, abstract, and sections, are identified as being between headings.
Documents with other formats could be converted but would be randomly divided
into audio tracks. The listener can still listen to the paper but does
not have convenient access to sections.
On-line software documentation
Developed software
My Zipped Python Code
-
Mock_main/ mock_html: Converts HTML files with specific format into
MP3 files as described in the design rational. In pratice, I use the pdf-to-html
executable to convert the pdf and manually impose the expected HTML format.
-
Helloworld.py/ myhtml.py – Main that calls PDF-HTML conversion,
HTML parsing, TTS, and WAV-MP3 conversion. This version will convert any
PDF to MP3 and divide the paper into sections. The text is arbitrary divided
into sections because the PDF headings are not reliably extracted (a difficult
CS problem).
-
PDF-HTML – DOS executable built from the source code.
-
MS TTS – Python has access to SAPI after the COM objects are wrapped
in Python. Instructions
for wrapping MS TTS.
-
LAME WAV-MP3 conversion – DOS executable and DLL built from source
code.
|