Multi-Version Documents: October 2012

Tuesday, October 23, 2012

Automating word and line selection

TILT has now been repackaged as an applet. It can auto-recognise words and lines. The user selects the word or line tool and then clicks on a likely candidate to select it. Also shown are the black and white image (which TILT actually uses to do word and line-recognition), and the view showing the detected baselines.

One thing this does show is that an applet is an excellent form for this application. It is based on a stable and powerful software platform (Java), is delivered over the Web, and facilitates both rapid development and user feedback. Although still far from finished, it is surprising what has been possible in just two weeks.

Word-detection

This works by assessing a small square around the click-point. If this is darker than average then it is added to the selection, otherwise it is subdivided into four smaller squares and reassessed. If either the big or one of the small squares are accepted, then the four big squares to the north, south, east and west are assessed similarly. Word boundaries are thus rapidly identified.

Line detection

This works by counting up the concentration of black pixels in each row of the image. Anything below the average intensity of the page is ignored. The peaks are mostly baselines.

Although this simple algorithm won't work for slanted or curved lines, I will later subdivide the image into narrow columns and join up the detected baselines. This should overcome the limitation.

There are still a lot of things it doesn't do yet, such as create actual links, or store the results. But, hey, one step at a time.

Saturday, October 13, 2012

TILT (Text to Image Linking Tool)

Visualising links between image and text on the Web is one thing, but how do you create them in the first place? Unless you have a lot of spare time on your hands and don't get bored easily you may find the available tools aren't up to much. For example, the Harpur poems come in the form of manuscript anthologies, 70-100 pages in length. Even if you restricted yourself to just one link per line it would take quite a while to define image maps using a tool like the Gimp image-map plugin. One ordinary-looking manuscript page from the Harpur collection took me more than one hour, and that was without defining any links to the text. The British Library edition of the Codex Sinaticus looks good, but cost millions of dollars to make (reportedly). There has to be a better way.

What I needed was a simple tool that would truly facilitate an essentially tedious task, and leave me with data in a form I could use. One clear requirement was for the definition and editing of polygonal areas, not just rectangles. Polygons are a must, if anything that is not strictly rectilinear happens in a text, for example a correction, or even if the paper was slightly curved when it was photographed. Since the available manual methods were all much too slow, I decided to write my own tool.

Try defining rectangles on a per-line basis here

There are two current tools that are freely available. One is the Text-Image alignment tool in TextGrid. Although it works quite well it is entirely manual, and it doesn't offer automation. It's also tied in with TextGrid, Eclipse and XML. I didn't want to use any of these. TILE on the other hand is written in Javascript, which would allow it to be usable over the Web. But it appears to be unfinished. Also there is doubt that a program of this complexity can be built quickly enough, and be sufficiently stable and robust using only web-languages. So I decided to use good old Java. This gave me plenty of power, stability, and a single language to develop in. After a few days coding I have nearly exceeded the capabilities of TILE. It can also be turned into an applet, so it can be used on the Web.

So far TILT can define rectangles and polygons on top of a resizable image. Either type of shape can be selected, deleted, moved, scaled and edited by dragging control points. The background image can be also reduced to black and white to facilitate automatic word and line selection. This should be easy to achieve with the image-handling facilities of Java.

I also have to add linking with the text, saving in HTML and JSON formats, and uploading to the server. But all of these are easy to do.

Saturday, October 6, 2012

Text and Facsimile Side-by-Side

One of the most common requests you hear about in digital humanities is to display a facsimile next to its transcription. The TILE project is one example, another was developed some years back, perhaps for the first time, by some guys at the University of Kentucky, for their electronic Beowulf edition. As soon as you think about putting a facsimile next to its transcription you see the need to link areas of the image to spans of text in the transcription. Sometimes the facsimile is hard to read, and it only becomes useful if someone has already gone over it to link those areas and spans to help the reader to work out what corresponds to what. This gives rise to two main problems:

How to provide an environment to define those links and save them
How to display this information and update it as the user scrolls through the text and zooms in on the image

Both solutions must work over the Web, because that is the medium everyone wants to work in. TILE dealt with problem 1. The Beowulf edition was mostly about 2, I think (you have to buy the DVD to find out). We are tackling both problems as part of our AustESE project - to provide an editing environment for electronic scholarly editions. Since someone else is tackling problem 1 I'll be talking about my solution to problem 2. I'd be curious to know if anyone else has already developed something similar.

How it works

This is an screen dump of the live demo on AustESE. The user has already zoomed into an area of interest and has just moved the mouse over a polygonal area. The correct span of text on the right has been highlighted. This seems disarmingly simple but is quite hard to achieve in the medium of the Web. Years ago, when HTML was first developed, imagemaps were invented that allowed rectangular, circular and polygonal AREAs to be defined for an image. They can be defined for example in a tool like Gimp. Mouse movements and mouse clicks could be detected through the maps to fire events that might, for example, be used to highlight parts of the transcription. What was not provided, however, was any way to style or graphically affect the image map. So to make the map more interactive we need to employ some fancy new Web tricks.

The first thing I found that highlights areas on an image map is a thing called maphighlight. Unfortunately it doesn't do zooming, scaling or scrolling. Scaling is required to fit the image and its map onto the user's screen size. Zooming is used to drill down to details on an image that is unlikely to be readable on many devices otherwise. And scrolling is needed to move around the zoomed image. But the maphighlight designers did hit upon a cool workaround for the lame functionality of MAP, AREA and IMG in HTML:

Their display is broken into three panes layered on top of each other. The top layer is the original image and its image map. The image is set to be invisible, so only the defined regions on the image are "visible" to the mouse and are activated as it moves over them. This is important because if the imagemap was behind the canvas the mouse events wouldn't fire and you'd get no interactivity*. Looking through the image and its map the user can see a transparent canvas, which gets drawn to in response to the mouse movements. Behind it, in the background is the scaled and panned image itself, this time visible. What this needs to work is support for HTML5. Fortunately all modern browsers have the required features. To get the zooming and scrolling to work I had to modify and augment the existing maphighlight jQuery plugin, which I renamed maphilite. There are some more features I want to add, but it basically works. I've tested it on the latest versions of Chrome, Firefox, Safari and Opera. I'm pretty sure it will work also in IE9.

Why not support older browsers? Firstly, it's just too hard. Secondly, anyone can install a free browser on their computer even if they currently use IE8 or less. Thirdly, in a few more months or years everyone will be using HTML5, so why bother?

I'd like to stress that this is just a demo at this stage. The next stage will be to create a bunch of image maps and define them in the HRIT system so they can be applied to images and transcriptions on the fly as the user scrolls through the text.

* I imagine some techies are asking 'Why not just use CSS z-indicies to stack the three panes?' Because MAPs are bound to their images and are not separately stackable, that's why. Either you have the canvas in front of the image and you get no detection of mouse movements or you can't see the canvas behind the image. Pointer-events might get around that but they don't work on IE or Opera. As a CSS-defined background the image can be scrolled by use of background-position and scaled with background-size. Otherwise I don't see how you can get it to work.