Christopher Serflek, University of Toronto
Jutta Treviranus, University of Toronto
Christopher Serflek, University of Toronto
Jutta Treviranus, University of Toronto
Virtual Reality Modeling Language (VRML) is an evolving standard for the creation and distribution of interactive distributed virtual environments on the Internet. Currently VRML is restricted to the creation of static three dimensional (3-D) objects. These objects may serve as links to other Internet resources by specifying a Universal Resource Locator (URL). Future developments of VRML are expected to incorporate sound, behaviors, agents, and multi-user environments. In its final form VRML will allow Internet users from any part of the globe to interact with each other and created animate or inanimate objects in a three dimensional virtual environment. VRML offers the same fundamental strengths as the WWW. It is platform independent, an open standard, and designed to work over a low bandwidth.
This paper will briefly outline the current status of VRML as well as the associated access challenges. We will discuss future VRML standards and the potential challenges they will offer. As VRML is in it's infancy this paper is largely conceptual. Many of the current standards and future standards have yet to be implemented. Therefore, we intend to provide an initial framework to address these upcoming challenges. People have often speculated that virtualreality (VR) and virtual environments (VEs) will be of great benefit to persons with disabilities. We will try to describe the subsystem or infrastructure required to support possible benefits. We will expand to show that the challenges presented by future implementations and developments of VRML go far beyond the challenges encountered in a 2D Graphical User Interface. VRML not only introduces a third dimension but a whole new way of representing and interacting with information. These challenges cannot be met by simply expanding the appropriate software hooks to accommodate presently available access tools.
Theoretically, access challenges in a virtual world should be easier to deal with than in the real world. All information transmitted is computer mediated and can therefore be filtered, translated, amplified or attenuated depending on the needs of the user and the demands of the task. What is needed is the will and a clear knowledge of what is required by the user. In a constrained virtual environment we can predict and provide the required accommodations for specific tasks and specific users. Distributed virtual worlds introduce some of the unpredictability of the real world. The authors of the virtual world do not know who the users are, what platform they are working on, what browser they are using, how many users will be in their environment, or what the users will do to the objects in the environment or to each other. Likewise when generating guidelines for VRML we must contend with the fact that VRML is meant to be generally extensible and should ideally serve many different applications (e.g.,games, scientific visualization, simulation, social interaction etc.).
The access challenges of VRML subsume all of the access challenges we haveencountered thus far. At this point we have a number of advantages:
- VRML is in its infancy and we have the opportunity to influence its development,
- presently VRML is a standard, influencing the standard will affect all applications,
- developers of VR and VRML are concerned with the human experience and therefore have the tools and the motivation to consider human interface challenges.
VRML like HTML (Hypertext Markup Language on the World Wide Web), has a standardized language specification for material published on the Internet, associated authoring tools to create the material, and browsers to allow usersto view the material. In order to make VRML accessible we must address the design and development of:
- the VRML specification,
- VRML browsers,
- VRML authoring tools,
- VRML display methods, and
- VRML control tools.
In addition, presently available alternative access systems need to becompletely reconceptualized and retooled.
While VRML is in its infancy, VRML developers have been very open and responsive to suggestions for specific changes or additions to the VRML specifications or browser. It is now the responsibility of the access communityto come up with generally accepted guidelines on how to provide access. We need to know what we want in order to provide guidance to developers and designers. We cannot simply insist upon access, we must be able to articulate how. We must also reach a consensus so that we do not confuse developers with conflicting recommendations and requests. In order to do this we need to answer some as yet unanswered questions in a large variety of fields including inter-human communication, navigation, manipulation, vision, audition, haptics, and information processing, to name a few. We need to develop a better understanding and quantification of how various individuals contend with the real world, what structures and tools do they depend upon? A few specific questions to be answered include: how do we best communicate complex simultaneous real time events which are presented visually in alternative formats, what level of information is optimal to allow someone to make a timely and informed decision,what is a good balance between intelligent assistance and open ended choice making?
The VRML Standards Process
The development of the VRML specification is a multi-step process with features or additional capabilities being implemented in each version of the specification. VRML 1.0 simply allows for the display of three dimensional graphics which can serve as links to other sites on the World Wide Web. Among the additions to be included in VRML 1.1 is sound and animation. VRML 2.0 is expected to include behaviors (objects behave based on time and events), interactions (a way to feed events into environments), multiple participants,and telepresence.
To facilitate the rapid construction of the first specification, it was voted bythe members of the WWW-VRML mailing list to adopt a subset of SGI's OpenInventor and to add nodes to support interaction with the World Wide Web (WWW)(Pesce et.al., 1994). Through the combined efforts of the WWW-VRML list members,SGI, and Template Graphic Services (TGS), VRML has been acknowledged as the standard for 3D display on the Internet by numerous universities, corporations and institutions such as Mosaic Communications Corp., DEC, NSCA, and others. The design process is driven by the suggestion and debating of various features onthe WWW-VRML list, then the strongest and most applicable proposals are introduced into the specification by its authors. Following this, a draft is submitted for review and comment towards the final specification. There is also a move to bring VRML to the Internet Engineering Task Force (IETF).
A comprehensive description of the VRML 1.0 specification is beyond the scope of this paper. For the access designer there are seven key elements to consider. These elements are the hierarchical ordering of the scene graph, the geometric description of objects and object properties, information nodes, the description field in the WWW Anchor node, the ASCII text node, and the coordinate system.
VRML files are distributed and composed in ".wrl" or world files. Contained within these files are the geometric descriptions of objects, their ordering and properties, environmental information, and reference to additional objects such as wrl, jpeg, or au files. At the heart of VRML is the scene graph. It is this which provides a hierarchical structure for ordering and implementing the nodes within a scene. The node is the basic building block used to create the scene. Each node holds a piece of information, such as a surface material, shape description, or geometric transformation. All 3D shapes, attributes, cameras and light sources present in a scene are represented as nodes (JosieWernecke, 1994). Nodes which proceed can have effect upon nodes which follow in the scene graph. Some types of nodes can contain other nodes, these are referred to as group nodes or parent and child nodes respectively (Bell et al. 1995).
As mentioned previously, the scene graph provides an overall structure for the inclusion of objects within a scene. Further, the hierarchical structure provides information regarding the relationship of objects. Using coordinate information, 3D spatial information regarding object placement can be ascertained. The geometric description provides information regarding object, shape, size, and properties (i.e. material, color, transparency, etc.). Given this information, it is feasible to provide several methods of alternative representation and interaction at the user interface level.
It may be possible to use the constrained context and logical structural information dictated by the scene graph, geometry, properties, spatial positioning and supplementary information provided by ASCII text for use by a browser to construct user relative representations of virtual environments through general filtering and agents. For example, a file could be parsed to provide the auditory output of "small qchair brown. It is one meter from your current position. Small transparent partly shiny table .5 meters ahead. "One obvious problem with this is the presentation of the world information is not in accordance with natural language. Further, any significance of the chair that may have been intended to be communicated visually will be lost. For example, the chair is in fact the chair for a Queen and is made of beautiful oak, and for the world and the task at hand this is significant. However, as the scene is parsed, the size of the chair was determined relative to the environment space and other objects present. Visually, this chair was larger than and quite elaborate compared to chairs in general, yet this could not be determined automatically. Additionally, the obstacle provided by the table would cause difficulties for navigation. There are possible methods to provide additional information for the experiencer to allow them to gain a greater understanding of the world.
The four final key factors come into play now. First is the info node. Generally used to communicate copyright and author information, this node can additionally carry general text strings which can be used for providing broader descriptions of an environment and its objects. Next, the ASCIItext node takes an ASCII textstring and presents this as a 3D representation in the world. This node works incooperation with the FontStyle node which provides information about a font's style, family, and size. Finally the WWWAnchor node can contain a description of links. Through this we have the possibility of providing higher level descriptions.
The following description illustrates a possible method of providing auditory access to the visual scene using the elements listed above. Upon entering theworld, an info node could state, "This is the public chamber for the Queen.This is where she makes big decisions. She has used money purloined from the peasants to elaborately design and decorate this chamber." So now the person has a general sense of the context and properties of the room. Most importantly the user now has a sense of context. They now know that if they are looking for the Queen's private chambers they are in the wrong room but they are getting close. What is needed is a high level description of objects in the space so they can decide whether to explore or leave. To achieve the equivalent of a quick visual scan of the objects in the room a list of objects may now be presented auditorally. Upon hearing the object name qchair, this Sparks interest as other chairs were named chair. They now request additional information about the chair. In essence what they have done is isolated an area of the scene graph and now they desire to traverse the child nodes of the parent node associatedwith the chair. The first child node section containsMaterial/ASCIItext/FontStyle nodes which are interpreted auditorally as "largegolden shiny metal text Queen Becky Rules". The user now knows this is theQueens chair and there may be something around it of interest. Further interpreting the scene graph the text description for a WWWAnchor node is read:"sit (click) in the chair and be transported to the Queens private chamber."Great! This is the goal of the user and it was accomplished by presenting highlevel information for executive decision making and then interpreting selective lower level descriptions. All irrelevant information could be ignored, such as the hand position on the grandfather clock across the room.
Some of the components necessary for the implementation of the above proposed approach have been incorporated in the VRML specification upon the suggestion of the authors, namely the ASCII text node, description fields in the WWW Anchorand possible use of the info node in describing meta scene information inaddition to the copyright description. Despite the availability of these features, such a proposal is admittedly brittle. It is highly dependent upon alogical hierarchical structure within the scene graph. Secondly it is reliant upon text descriptions being added to the scene using a node whose primary purpose is to provide other information.
Similar principles, to those discussed above, could be employed for assistingnavigation. An example is demonstrated by the SeekTool in Web Space (a VRMLbrowser, SGI 1995). This tool allows a user to click on an object, causing their position to be automatically adjusted to close proximity to the object. This eliminates the need for laborious and complex manipulation of input devices. The same approach could be applied to create virtual guide dogs which accept high level instructions.
Another access approach is to explore alternative display methods and alternative input methods. An additional channel for the presentation of spatial and object related information is haptic input. It is quite feasible that geometry, shape and properties could be presented through the use offorce-feedback and tactors. Unfortunately, such a hardware setup is currently very economically prohibitive. Further research and development is required inthis area.
It is hoped that, despite varying contexts of individual virtual environments; objects, structures and properties are consistent elements and can be utilizedto develop a general method of interpreting the world, based upon userabilities, goals and needs. Critical to this is a logical hierarchical structure, consistent input/output protocols, and the cooperation of VRML authors, all of which may be threatened by independent commercial developments and economic pressures. In order to accomplish even the modest level of access described in the example, modifications must also be made to the VRML browsers and authoring tools. A preliminary discussion of the issues to be considered follows.
Currently there are no browsers capable of supporting the user actions described in the above example. In fact, some do not support even access fundamentals such as keyboard equivalents. Ideally VRML browsers should support the type of user navigation and exploration described above, should provide basic access features and should be compatible with alternative access control and display systems. Before browser developers can begin to provide these accommodations we must develop generally accepted access system guidelines and standards. Unfortunately this may mean developing radically new access technology. Currently available screen reading packages are of little use, a completely different paradigm is used to present information and possible interactions. Once these standards are developed we must insure that the standards processes are met and that companies do not deviate from standards(like Netscape has).
There are new browsers becoming available on a weekly basis, with many more indevelopment. Currently SGI's Webspace (SGI 1995) is the only browser available in full release supporting all of VRML 1.0 features. Template Graphic Servicesis working with SGI to port Webspace to other platforms such as Windows NT (TGS1995). InterVista's WorldView (Parisi 1995) is the only VRML browser available currently for Windows 3.1. These and others (Serflek 1995c) are in various stages of development. Source code is available for Webspace, and is promised tobe available for IICM, NSCA and the University of Minnesota's VRWeb when it reaches its release stage (Hardenburgh, 1995).
The ATRC is presently seeking support to develop accessibility utilities forVRML browsers. Given the level of development of VRML in general, this support may not become available in the immediate future.
VRML Authoring Tools
A number of VRML authoring tools are emerging on the market. One example being Ez3d (Radiance,1995) a 3D package which can output VRML files. Virtus has also created a VRML tool (Virtus, 1995) as have a number of other companies(Serflek 1995c). In addition to the authoring tools there are conversion tools available which allow the conversion of existing 3D files into the .wrl format.
Of greatest importance is the type of VRML file produced by these tools. It isconceivable that a file can be produced which looks identical to an accessible file (when displayed) but contains none of the structural information orhierarchical organization needed to provide access. We should investigate wayswhich VRML could be structured via the scene graph that would not affect the normal use of VRML, but would aid in parsing the file by limiting the number of variants to contend with. This may be as simple as encouraging developers to incorporate a recommended order for information that would otherwise be ordered in an arbitrary manner. By modifying authoring and conversion tools we can affect a large amount of material published, as VRML files will rarely be prepared by hand, VRML authoring being much more challenging than HTML. We must also work with authoring tool manufactures to increase the accessibility of the software.
Shortly, a draft specification for version 1.1 of VRML will be released. Features expected to be included in the draft are simple animation, il 8n text, caching of objects (i.e. on a CD-ROM), video streams as textures, annotation text, and possibly both ambient and localized sound (Hardenbergh 1995, Pesce1995). A much desired feature is the addition of annotation text. This addition will allow for a textual description of various media (i.e. images, sounds,objects). This annotation will have benefits even if it is difficult to derive aliteral text interpretation of a media. In our work with HTML, we have learned that alternate text can be even more beneficial if it provides functional information about the document. For example, a logo serves to distinguish a page and give an indication of the content of the document. Additionally, an image map allows a viewer to quickly ascertain the resources and functional aspects of a document. Through small pilot studies, we have found that primary importance should be placed on providing a general sense of the context and content. This allows a user to develop a mental model. Put simply, it is just as important if not more important to state "why" than to state "what".
To fully utilize the annotation text possibilities two conditions must be met.First, a greater understanding of effective placement of the annotation textnode within the scene graph must be determined. The annotation node in 1.1 provides an official method of providing high level descriptive information,thereby replacing the unofficial use of the info node in VRML 1.0.
The second and primary condition for the success of annotation text is it mustbe utilized. As it will be generated by humans, there is no assurance that this information will be available or be accurate. This strongly indicates the need to find methods to complement the annotation text information. However, onepossible avenue exists. It is likely that there will be strong growth in thedistribution of CD-ROM's and sites for 3D models. These models would serve either as alternatives methods of viewing high traffic sites or as the VRML equivalent of clip-art. It should be possible to encourage the distributors of this material to add descriptions of these objects. This would form the basis oflower level descriptions of objects in scenes. This would mean that authors need only to provide descriptions for the scene as a whole and for complementary media.
The inclusion of sound will be of additional benefit in providing audio cues. Ambient sounds will provide a general sense of the environment as a whole. Localized sound will aid in providing directional cues if the auditory information is presented as 3D. The inclusion of sound will necessitate the development of a captioning strategy appropriate for a 3 dimensional environment. This may entail 3D display of captioning as well as multimedia captions incorporating animation and video clips.
With the introduction of 2.0 features VRML will become much more excitingand enjoy more general applicability for a variety of uses. It will become avery important standard. The planned features to be added are behaviors (objectsbehave based on time and events), interactions (a way to feed events intoenvironments), multiple participants, telepresence, and sound if it was not implemented in the VRML 1.1 specification (Hardenbergh 1995). This new standard will subsume previous access solutions and challenges. To understand the infrastructure through which this will be accomplished, it is useful to divide the main components into three main categories: Geometry, distribution ortransportation protocols, and behaviors. Nevertheless this is all very speculative as this is still very much under debate. Additionally this distinction is somewhat arbitrary as the three components are largely intertwined and dependent upon each other.
It is difficult to foresee what additional changes or additions to geometrywill be added to VRML 2.0. Likely, these will be extensions to VRML 1.1 (i.e.,adding additional nodes and refining existing nodes and their usage and properties or the addition of other features from Open Inventor). It is not anticipated that this will introduce any additional major access concerns.However, as we learn more about access needs we may wish to introduce minor changes to enhance access.
The current method of distributing VRML environments by means of aclient/server approach utilizing HTTP will not be possible for 2.0 features. Ithas become necessary to work towards developing a Virtual Reality Transportation Protocol (VRTP). This is necessary for several reasons, the primary reason beingthat present methods do not meet design assumptions regarding bandwidth, latency, efficiency (Brutzman 1995), and compatibility (Roehl 1995a). It may be necessary to adopt different types of communication for different tasks. One possible method of meeting the requirements of bandwidth, low latency, efficiency and compatibility is to divide communication into four maincategories: Light-weight Interactions, Network Pointers, Heavy-weight Objects,and Real-time Streams. (Brutzman 1995)
- Light-weight interactions give information which can be used to update the position and velocity of an entity. They may also give information about status.This information is not sent continuously. In order to increase efficiency and decrease bandwidth requirements strategies are employed which update theinformation only if the status has changed a predefined amount, or at preset time intervals.
- Network Pointers are used to update each member's registry of all other members in a multicast group. Both Light-weight interactions and network pointers can be transmitted through more efficient, less reliable means.
- Heavy-Weight Objects encompass interactions which must transmit large amounts of data in a reliable manner such as .wrl's, gif's, etc. Heavy-Weight Objects will likely continue to employ http or http-ng.
- Real-time Streams refers to information which must be communicated in realtime such as live video, audio, etc. This will likely be done using multicast channels.
There is much work do be done in improving and enhancing the current methodsof Distributed Interactive Simulation communication (Brutzman 1995). There is doubt that current methods of communication can be generalized to supportgeneral VE's and complicated non-deterministic behaviors (Roehl 1995). There is a move to have current methods of network communication form the basis of thelow level behavior and develop new schemes to handle more complex behaviors. Priorities used to select the methods of communication may not match therequirements of the access features. These decisions will need to be carefully monitored.
There are two main competing proposals for incorporating behavior in VRML.This first is to incorporate Open Inventors sensors and engines (Brutzman 1995).There is little detail and little support for this approach.
The second is based on Roehl's (1995b) working paper on behaviors. Roehl(1995b) suggests several levels of behavior based upon the extent the behavioris determined or predictable. For example a mountain is very determined, thebehavior of a clock would be largely predictable, a foraging squirrel may have aset of goals but the method of executing the goals is unpredictable.
Lower level behaviors (e.g., the mountain and the clock) can be transmittedusing communication methods for light-weight interactions discussed above. Greater difficulty and controversy is associated with higher level behaviors.There is little agreement as to whether there should be a standard behavioral description language, what that language should be, whether the behavioral description should be tied to the object or separate from the object and how the information should be transmitted. As yet it is hard to determine which approach would be beneficial from an access perspective.
Behaviors will present challenges that are new and complex. We must interpretbehaviors given available information and present meaningful descriptions, avoiding cognitively taxing descriptions such as "arm rotating.04 metersper second on the Y axis," in order to communicate that a glass is being "pickedup". The introduction of less predictable behaviors in multi-user environments will complicate matters.
Agent do, Agent do too?
An interesting approach could be developed in distributed environments which include multicasting, this approach is modeled on real life. The presence ofother users in the environment can be used to assist the user of an alternative access system. We can collect data on how others are behaving in a VE and use this information to present a limited set of meaningful choices to the user of an alternative access system thereby reducing the number of necessary steps required to complete a goal and giving lower priority to choices which are inappropriate for the situation. This can be done in a non-interfering/non-invasive manner.
Consider the following situation. A person is walking through the forest with agroup of friends. The group as a whole happened to be members of the Saskquatch Appreciation Society. Suddenly there is loud noise from behind a specific groupof bushes and a loud roar. Apparently, the Saskquatch does not wish to be appreciated as the members of the society quickly assess. At this point, most members deduce that it is best to leave and leave quickly. This is a flight orflight situation, quick reaction time is essential. One member of the group hasonly one voluntary control utilizing a single switch. Even if this had occurred directly in front of her, or if someone yelled "RUN!!", she would havehad difficulty navigating herself away quickly enough to have a good laughlater. Agents could detect that upon the arrival of the new entity, the majorityof previously present entities suddenly and quickly moved away from that specific entity. As all update information for all users is potentially transmitted to all other members, the woman's browser could now present her withthe option "run". Upon selecting this option the browser could automatically move the user's avitar in a similar speed and direction as allothers in the group. It is important to note that the user remains in control,other choices are available, and the user can choose to ignore the priority choices presented.
In meeting the access challenges of VRML we should avoid "regressive"solutions, we should not insist that developers include previous generation technology or translate new information structures into old forms. We shouldexploit the new technology and the new possibilities and direct them to meet ourneeds.
It is imperative that the access community develop generally acceptedguidelines on how to provide access. We need to know what we want in order toprovide guidance to developers and designers. We cannot simply insist upon access, we must be able to articulate how. We must also reach a consensus sothat we do not confuse developers with conflicting recommendations and requests.
VRML presents access challenges which surpass anything we have faced so far.It also presents tools which are more powerful than any we have had at our disposal to date.
Bell, G., Parisi, A., & Pesce, M. (1995) "The Virtual RealityModeling Language Version 1.0 Specification".http://vrml.wired.com/vrml.tech/vrml10-3.html
Brutzman D. P., Macedonia, M. R., & Zyda, M. J. (1995). "InternetInfrastructure Requirements for Virtual Environments".ftp://taurus.cs.nps.navy.mil/pub/auv/brutzman/nii_2000.txt
Dickerson, J. A. & Kosko, B. (1994). "Virtual Worlds as FuzzyCognitive Maps". Presence 3(2), 173-189.
Gossweiler, R., Laferriere, R. J., Keller, M. L., & Pausch, R. (1994). "AnIntroductory Tutorial for Developing Multiuser Virtual Environments".Presence 3(4), 255-264.
Hardenbergh, J. C. (1995). "VRML Frequently Asked Questions".http://www.oki.com/vrml/VRML_FAQ.html
IICM (1995). "VRweb". ftp://iicm.tu-graz.ac.at/pub/Hyper-G/VRweb/
Intervista (1995). "WorldView"http://www.hyperion.com/intervista/technology.html
Paragraph (1995a). "Home Space Builder".http://www.us.paragraph.com/3dspaces/catalog/empties/
Pesce, M. & Behlendorf, B. (1994). Moderators of "WWW-VRML"listserv. http://vrml.wired.com/
Pesce, M. (1995). "Keynote Address to WWW '95".http://vrml.wired.com/arch/1390.html
Radiance (1995). "Ez3d".http://www.webcom.com/~radiance/vrml.html
Reynolds, C. (1995). "Boids (Flocks, Herds, and Schools: a DistributedBehavioral Model)".http://reality.sgi.com/employees/craig/boids.html
Roehl, B. (1995a). "Distributed Virtual Reality -- An Overview".Working Paper http://sunee.uwaterloo.ca/~broehl/behav.html
Roehl, B. (1995b). "Some Thoughts on Behavior in VR Systems".Working Paper http://sunee.uwaterloo.ca/~broehl/behav.html
Serflek, C. (1995a). "Initial Survey of Virtual Reality ModelingLanguage Access Issues". Internal Drafthttp://www.utirc.utoronto.ca/AdTech/rd/vrml/intech001.html
Serflek, C. (19995b). "Accessability Update". Internal Noteshttp://www.utirc.utoronto.ca/AdTech/rd/vrml/update.html
Serflek, C. (1995c). "VRML Tools and Information Links".http://www.utirc.utoronto.ca/AdTech/rd/vrml/links.html
Silicon Graphics Inc. (1995). "Webspace".http://www.sgi.com/Products/WebFORCE/WebSpace/
Van Hensbergen, E. (1995). "Distributed VR Mailing List".http://www.csh.rit.edu/~airwick/dist.html
Virtus (1995). "Virtus". http://www.virtus.com/
Template Graphics Software (1995). "Webspace Information Home".http://www.sd.tgs.com/
Treviranus, J. (1994a). "Virtual Reality Technologies and People withDisabilities". Presence 3(3), 201-207.
Treviranus, J. (1994b). "Mastering Alternative Computer Access: TheRole of Understanding, Trust, and Automaticity". Assistive Technology 6(1),26-41.
Treviranus, J. & Serflek, C. (In Press). "Alternative Access to theWorld Wide Web" CSUN 1995 proceedings
Treviranus, J. & Serflek, C. (1995). Moderators of: "TheVRMLACCESS-LIST" listserv. http://www.utirc.utoronto.ca/AdTech/rd/vrml/vrmlaccess-list.html