The PDF to MP3 project provides alternative access to
academic documents for the blind, weak readers and others. Using an MP3
player the person listens to a synthesized reading of the paper. The
audio is divided into tracks and read in multiple voices to accommodate
the listener’s navigation and comprehension. The developed software
performs the PDF to MP3 conversion using the Python programming language
and third party components.
Table of contents
Break Update (ppt)
The demo demonstrates an audiotext version of:
M. Kamel and James A. Landay, Sketching Images Eyes-free: A Grid-based
Dynamic Drawing Tool for the Blind, Assets 2002.
The MP3 files can be heard using an MP3 player, for example, on a
desktop with Winamp or the Creative Muvo MP3 player.
The zipped version of the audio is available at:
The prototype of the conversion system has been implemented
from existing software components; below are links to the software. The
Python programming language is used to combine the different components
of the system. The “pdftohtml” software extracts text and font
settings from a PDF and converts them into HTML. The Python HTML
parser reads in the HTML file, divides the paper’s components based
on format, and stores the text in a data structure. The text in the data
structure is processed by Microsoft's TTS engine using the male
voice to emphasize keywords and the female voice to read text. Microsoft
Speech Application Programming Interface (SAPI) also provides the
functionality to save the spoken word into WAV files. The WAV files are
converted to MP3 files using LAME software. The final
audiotext is about 30-45 minutes long depending on paper length and
The prototype is limited to converting papers of a fixed format, in
which headings are the only bold text. The parts of the paper, such as
title, abstract, and sections, are identified as being between headings.
Documents with other formats could be converted but would be randomly
divided into audio tracks. The listener can still listen to the paper
but does not have convenient access to sections.
On-line software documentation
Zipped Python Code
- Mock_main/ mock_html: Converts HTML files with specific
format into MP3 files as described in the design rational. In pratice,
I use the pdf-to-html executable to convert the pdf and manually
impose the expected HTML format.
- Helloworld.py/ myhtml.py – Main that calls PDF-HTML
conversion, HTML parsing, TTS, and WAV-MP3 conversion. This version
will convert any PDF to MP3 and divide the paper into sections. The
text is arbitrary divided into sections because the PDF headings are
not reliably extracted (a difficult CS problem).
- PDF-HTML – DOS executable built from the source code.
- MS TTS – Python has access to SAPI after the COM objects
are wrapped in Python. Instructions
for wrapping MS TTS.
- LAME WAV-MP3 conversion – DOS executable and DLL built from