PDF to MP3 converter --
Alternative access to academic papers 
 
Overview
The PDF to MP3 project provides alternative access to academic documents for the blind, weak readers and others. Using an MP3 player the person listens to a synthesized reading of the paper. The audio is divided into tracks and read in multiple voices to accommodate the listener’s navigation and comprehension. The developed software performs the PDF to MP3 conversion using the Python programming language and third party components. 

Table of contents

Design Rational [top]
Online version
PDF version
Presentations [top]
Spring Break Update (ppt)
Final Presentation (ppt)
Demo [top]
The demo demonstrates an audiotext version of:

Hesham M. Kamel and James A. Landay, Sketching Images Eyes-free: A Grid-based Dynamic Drawing Tool for the Blind, Assets 2002.

The MP3 files can be heard using an MP3 player, for example, on a desktop with Winamp or the Creative Muvo MP3 player.

The zipped version of the audio is available at: \\dorianm-cs\research\enable_tech\present
 

Implementation [top]
The prototype of the conversion system has been implemented from existing software components; below are links to the software. The Python programming language is used to combine the different components of the system. The “pdftohtml” software extracts text and font settings from a PDF and converts them into HTML. The Python HTML parser reads in the HTML file, divides the paper’s components based on format, and stores the text in a data structure. The text in the data structure is processed by Microsoft's TTS engine using the male voice to emphasize keywords and the female voice to read text. Microsoft Speech Application Programming Interface (SAPI) also provides the functionality to save the spoken word into WAV files. The WAV files are converted to MP3 files using LAME  software. The final audiotext is about 30-45 minutes long depending on paper length and reading speed.

Block diagram of conversion software components. Input is PDF and output is MP3. The conversion software is in Python.

The prototype is limited to converting papers of a fixed format, in which headings are the only bold text. The parts of the paper, such as title, abstract, and sections, are identified as being between headings. Documents with other formats could be converted but would be randomly divided into audio tracks. The listener can still listen to the paper but does not have convenient access to sections.
 

On-line software documentation

Developed software

My Zipped Python Code

  • Mock_main/ mock_html: Converts HTML files with specific format into MP3 files as described in the design rational. In pratice, I use the pdf-to-html executable to convert the pdf and manually impose the expected HTML format. 
  • Helloworld.py/ myhtml.py – Main that calls PDF-HTML conversion, HTML parsing, TTS, and WAV-MP3 conversion. This version will convert any PDF to MP3 and divide the paper into sections. The text is arbitrary divided into sections because the PDF headings are not reliably extracted (a difficult CS problem).
  • PDF-HTML – DOS executable built from the source code.
  • MS TTS – Python has access to SAPI after the COM objects are wrapped in Python. Instructions for wrapping MS TTS
  • LAME WAV-MP3 conversion – DOS executable and DLL built from source code.
     
     top * home * academics
    dorian miller, 4/28/2003