Denis Romanovski

  

AW Navigator (Auditory Web Navigator) research project

awn an auditory display application project for audible web navigation. AWN will function as multimodal sound map representing graphical/structural layout of web pages; with mouse navigation and headphones playback. Sonification of a web page layout is aiming mainly to improve effectiveness of surfing Internet for visually impaired users, though its conceptual model can be applied in other auditory displays/sonification implementations.

AWN has two principle layers of navigation : (1) browsing/recognizing sonified layout of a web page as a sound map of semantically & spatially (stereo) positioned sounds representing certain layout objects and groups of objects; (2) retrieving context information from certain layout object represented in verbal mode (text to speech).

AWN algorithm in general consists of next processes: (1) web page analysis and interpretation, which includes analyses of layout structure, present objects (text and navigatio blocks, images, links, input fields, etc.) and links navigation, defining navigation anchor points; (1.1) remapping layout objects for possible semantical grouping and scaling resulting map (2) assigning sound samples, sound effects and mixes ; (3) sound playback according to mouse pointer movements over the sound map, playback text to speech by users demand.

AWN implies creation of intuitive acoustic language (or “Birds’ language”), i.e. a matrix of pre-recorded and generated sounds/sound symbols along with applicable sound effects and mixes to be intuitively recognized from everyday experience in order to be used for sound representation of objects, structures, movements and actions in their spatial and/or structural relations. why? because we need sound samples which will be recognizable after various modifications, the sounds we experience in everyday life imprint strong associations, therefore will be recognized in a wide diversity.

Auditory Web Navigator research (AWN) is an Open Research project . By analogy to Open Source, Open Research means applying Public Licensing to researches, art and art&research projects, other creative investigations and developments; Open Research project means that the project is open for collaboration on any stage of research or development and its materials are published on a public domain.

7 March 2007, Denis Romanovski


AW Navigator prototype

Multimodality:
- visual and audial representation
- dynamic soundmap:
- "relative" and “absolute” positioning
- dynamic sound cues


Structural and contextual analysis of a web page AuditoryWeb navigation plugin database
- associative index of multiple sound samples for each cue
- index of sound effects required for semantical grouping and
spatialisation
Text-to-speech
- using open source synthesis engines, with applied Aural style sheets

GUI input
- mouse
- keybord
GUI output
visualisation: highlight objects/tags under mouse pointer

Audio output: head phones

Java Sound API and Java to Open AL dynamic operations:
multichannel sound playback, volume, pitch shift, panning, loop playback, attack/decay, crisp/muffled filtering etc.

introduction

AWN will function as multimodal sound map representation of spatial layout of web pages, navigated with user’s mouse input and headphones playback.

It serve fro surfing Internet by visually impaired users, though its model can be applied in other auditory displays/sonification implementations.

Why?

  • spatialization vs. visualization
  • Sonification of a web page layout is aiming mainly to represent spatial arrangements, rather then just visual effects.
  • design as spatial layout represents semiotic patterns of information, allowing to recognize the content (or its type) at the very instance.
  • And that is what ordinary screen readers are missing.
  • Spatial cognition involves motility (even imaginary).
  • Computer mouse as a haptic device (input).
  • Acoustic feedback vs. haptic feedback (output)
  • we need an auditory language: intuitive in recognition, spatial and/or capable to reflect semiotic relations between objects on screen.
  • Long life earcones! But they often can be too annoying!

sounds from every day experience?

  • Dynamic sound ques.
  • pre - recorded sounds.
  • sounds we experience in everyday life imprint strong associations, and will be recognized in a wide diversity of types and frequency bandwidths.
  • we need sound samples which will be recognizable after modifications
  • each sound symbol should have several variations yet close enough (to be recognized as similar, example: typesetter)
  • similar objects on same page will sound similar but in variations defined by location, object size, nearby objects, etc.
  • duration of each sound sample 1-4 sec. played in a loop with various rhythms.

Mapping sounds :aura

  • aura: all objects have sound aura (distance ~100-120 pix)
  • aura: sound volume varies depend on distance from mouse pointer: 1-180/ from 256 max,
  • aura: Doppler effect depending on pointer movement, to increase movement recognition;
  • aura: relative positioning, relative to the “source object”;
  • aura: sound quality: “muffled”

Mapping sound :object

  • object: sound volume : 256/256; in case of listening TTS reading “context of the object” sound volume drops down to: 120/256
    object: sound quality: “crisp” or “sharp”;
    object box border: recognized with harshly increased volume and change of crispness of the sound quality ;

Mapping sound :where is the object?

Answer: object(box) center H&V position relative to browser window + box size

H object location: Panning L-R;
V object location: looping frequency rising for upper objects and lowering for lower objects
Pointer location: relative to browser window: panning L-R, loop rhythm. Note: Since it will present constantly it is better to allow user to choose the sample (rustle or footsteps or bounced ball or other).  quantity of objects : can be used echo or reverb effect to picture number of present objects (less objects –stronger echo) echo parameters might be also used to picture distance from an object

Mapping sound: object size

  • object size: pitch shift (without changing duration): the smaller object – the higher pitch.
  • object size: the smaller is object – the more narrow sound bandwidth
  • object size: attack and decay, the smaller is object – sound attack is shorter.

Dynamic cues: Language or database?

compare with hierogliphs, where limited amount of core ideas with fiew combination rules produse vast amount of symbols and meanings. And even without knowing particular symbol it is possible to guess its meaning by recognizing its core components.

  • minimal amount of core ideas:
    passive elements: Text, Picture, frame;
    active elements: links, buttons, input fields, etc;
    actions: user actions: pressing, moving, typing, scrolling, in-out, and
    window ”processes” or action feedback: loading, roll down, etc.
  • combination: linear mixing, sequentially including in a loop, imitating envelope.
  • We can emphasize positioned sound with certain associations, example: add ”water splashes” to the page footer.
  • AWN functions in 2 general levels of navigation: (1) by browsing a sound map of aural symbols representing layout of the web page and its structure; (2) by retrieving context content using text to speech.
  • (1) sound map may be ”zoomed in and out” i.e. Grouping tables raws, columns and cells, divs. Example: menu links group, news column, page footer, etc
  • (2) sound from TTS need to be spatially mapped.

Web page analysis

  • collecting data (collect the hierarchy tree of the web page, box positions and their dimensions)
  • navigation analysis (define all links on the page and collect info: where they lead: inside the domain or out; collect links descriptions (tool tips and keywords, etc. if absent preload the linked page and extract single random paragraph. Define interactive elements: buttons, check boxes, text fields, etc. and define their obligatory forms (if any), ; define present scrolling bars )
  • images analysis (image types, images with links, banners and image based menus, counters; ignore all background images, find repeating images, try define necessary image graphics to include in navigation and ignore decorative images, if an image does not have an alt text attempt Optical Character Recognition)
  • defining web page structure (frames, tables, div, script objects, styles, etc.; attempting to group them in navigation blocks, like menu, text blocks, etc.;)
  • Defining navigation modes (alternative layout?): custom rendered web-page: grouped navigation , texts.
  • define language for text to speech language module

the AWNavigator project was supported by The Interactive Institute and "Research and development in the arts" (KU) Kungl. Konsthögskolan (KKH)