VBA: Text->Img

Tags:

Introduction

Stimulus presentation in reading eye tracking studies can be done in two ways — text based or image based.

  Pros Cons
 Text-based
  • Easy to generage and change
  • Easy formating mark-up
  • Font/size is system-dependent (e.g., Unicode)
  • Rendering the formats can be hard
  • Rasterizing takes time during experiment
  • Complex code in experiment program, prone to errors
  • Difficult to replicate in data visulization
 Image-based
  • Once rendered, never change
  • Simple, consistent to handle
  • abandant image libraries for all languages
  • system-independent
  • works for all languages
  • can fix with images
  • Difficult to create
  • Larger size, many more files
  • Once rendered, cannot change

 

The code posted here addresses the difficulty in converting texts — formatted and possibly mixed with images — into images.

Virtual Printer Driver

In the past I have tried other methods such as using a virtual printer driver, such as those used in fax software. They are problematic because they often cannot handle paper sizes in pixels or they have limited color schemes (often b/w only).

A recent search landed me on the Universal Document Converter, a virtual printer driver that supports a variety of graphic formats, including PNG. Particularly attractive is the ability to control the driver via VBA, where one can set the output file name and paper size. Adding some VBA code to output the SEG info, this seems to be a complete solution.  Cost: $69.

Word-Acrobat Writer

There is a solution that does not have any additional costs: print DOC to PDF, and then "save as" pictures. The trick is to set proper paper size in both Word and Acrobat. We have to work backwards:

  • PDF->PNG seems to always work at 200 DPI. There is no parameter to set. This means that for a output of 800×600 pixels, you want 4×3 inch paper size.
  • In Word, set the paper size to 4×3 inch; adjust margines.  Use 6-point fonts if you want them to appear as 12 points on the screen. Zoom in 200% in you want to see what it’s like.
  • Print to PDF. Now you need to set a "customized paper size", and make it 4×3. [I created a new printer profile to set it to 100 DPI, but apparently it made no difference. I guess we can use the "standard" settings except for the paper size]
  • In Acrobat, go "save as". Choose PNG or TIFF (no JEPG for b/w texts). It will save pages as "filename_page_1.png"
  • Then, we have to run VBA script to output the SEG info. Changing picture file name is possible but may not be worth it.

PowerPoint

The latest version of PowerPoint (Office 2000) has the ability to export a slide (or a presentation) to various image formats. It handles page size fairly well. The biggest problem is that PP does not do automatic pagination. When you paste a long text into a textBox in a slide, most of the lines will be out of range.

The PPT to Image Converter

The following code takes care of the problem. After the (long) text is pasted in, you run this script, and it will paginate, save each slide as a GIF file, and create an XML file that records the coordinates of each line, word, and character within each slide.

Features:

  • Automatic pagination: paste your text in the text frame in the first page, hit Alt-F8, and run the script. It will generate all the pages for you
  • Rich text formatting support: any basic text formatting that PowerPoint supports.
  • Uniform text properties: as it stands, several key parameters, such as text size, font type, line spacing, etc., are specified in the code. This allows uniform output. If you like WYSIWYG, take out the corresponding code.
  • Unicode support: both the image output and the XML output file that specifies word coordinates support Unicode
  • Multiple image format support: simply change the "GIF" extension in the code to anything that PowerPoint supports.

What you need to do to run this script:

  1. first, download the demo PowerPoint file with the VBA script
  2. which includes the VBA code in it, along with a page with some text that is longer than the page width.
  3. go to the Visual Basic Editor: menu Tools->Macros->VBA editor, or F11
  4. Run the script. It will ask for a Base Name, which by default is the name of the PPT file. Change it as you want. All image files will be saved at the same directory as the PPT file, with name like basename_1.gif. The XML file with segmentation coordinates will be saved as basename.xml.
  5. In a few seconds it will complete, and tell you to check for the files. You will find addional slides created in the process. Save the PPT file so that you can regenerate the images if you want (File->Save As-> choose an image format). But I’d recommend NOT to overwrite the original PPT, so that you can re-paginate if you want to change the # of lines per page, for example.
  6. Parameters such as page size, margines, line spaces, and font and hard-coded. If you want to change them, you need to edit the code.

To-do:

  1. Error: doesn’t handle characters whose ascii >128. They are not allowed in the strictest xml format (at least Python’s default xml parser "minidom" doesn’t like them).
  2. A GUI for controlling the parameters
  3. the flexibility to allow customized font size, color, etc. It should be easy, just remove the hardcoded values.

Download:

 

No Tags

Edit this entry.