Evaluation of the JITM System
The JITM system was developed in splendid isolation. As shown by the small numbers of attendees at the 1996 workshop on electronic editions, there were and still are few people working on such projects here in Australia. At the time, in the Australian Scholarly Editions Centre, it could also be described as a case of the blind leading the blind when it came to SGML. Perhaps this isolation, and the idea of producing a system which centred around maintaining the authenticity of the transcription, assisted in the conception of the JITM paradigm. As part of this project, a survey was run amongst consenting attendees of the electronic editions workshop, to gather information from other developers of electronic editions. The goals of this survey were to see if the experiences of the author were common amongst developers and also to check if the features of the JITM paradigm match the requirements of other electronic editions developers. The first section presents the results of the survey and comments on whether the results support the reasoning behind the development of the JITM paradigm. The recent release of the Guidelines for Scholarly Electronic Editions [MLA, 1997] gives us another benchmark to measure the JITM paradigm against in the second section. The final section discusses some non-technical obstacles to the acceptance of the JITM paradigm by the scholarly editing community. 6.1 A Comparison against Current PracticesThe survey, was designed to determine the current practices of developers of electronic editions. Its major weakness is the small number of participants, and so any statistical analysis of the data will have limited validity. However trends in the data give some support to assertions made about the current state of the field of electronic editions, and indicate that the advantages of the JITM paradigm would seem to address problem areas in the current practices. The original survey, distributed to the participants via email, can be found in Appendix A. The returned survey responses are available on the demonstration disk provided with this report. Of the workshop attendees polled, sixty-six percent responded that they would participate in the survey. This was a total of ten participants. The group consisted of; Phd candidates creating critical editions for their degrees; scholarly editors interested in computer-based tools for creating printed critical editions; and scholarly editors who were interested in creating electronic editions. Because of the small number of participants, and the general lack of experience with electronic editions, the participants were asked to give hypothetical answers to those questions for which they had no direct experience. Each section of the results contains a brief description of what information the set of questions was attempting to discover; a table of results; the full text of each question with the anecdotal majority response and comments on the result. In the tables of results the following symbols are used for brevity;
In the anecdotal results for the questions, a majority answer is given as the aggregate result for a question where a "yes" or "no" result was expected. The result is based only on those responses where either "yes" or "no" could be determined from the survey response. To give some indication of the relevance of these results the aggregate answer is given in the following form;
The numbers in braces indicate the number of responses that gave the majority answer out of the number of valid responses for the question. The example above indicates that for nine valid responses, seven were "yes". Omitted responses were either indeterminate or no answer. For some of the questions, where a numerical answer was expected, some of the responses had to be interpreted. In some cases numerical values had to be estimated from verbal descriptions, and in others responses were in the form of a numerical range (eg. 60-80%). In the latter case the mid-point of the range has been entered into the table. Statistically, analysis of the results for such a small number of responses would not have much validity, especially considering the interpretations noted above. Therefore the results of the questions with numerical answers are presented as histograms showing the spread of the results and hopefully the general trends. 6.1.1 Questions for all editors of scholarly editionsThis section of the survey contains questions pertinent to the creation and maintenance of any scholarly edition, electronic or otherwise, involving the transcription of an original source document. The section was included in the survey to determine if the assumptions made in this report about certain aspects of the creation of a scholarly edition are recognised by other scholarly editors and to give some indication of the importance they place on these concerns.
6.1.2 Questions for creators of electronic scholarly editionsThis section is designed to determine what form of electronic editions are currently being created and also to determine whether the participants have considered some of the maintenance and authenticity issues raised in this report.
1 Is the computer-based version of the edition based on a printed publication? Majority answer (7/8) is "yes". Comment: This would tend to indicate that electronic editions are seen as an experiment for the participants. Anecdotal evidence also suggests that electronic editions are also seen as a way of reducing the cost of access to critical editions so that they can be used as teaching materials. 1a If the answer to question 1 is yes. Please describe the process of conversion from printed to electronic form. The most common answer was that the electronic editions used the same ASCII transcription files used for printed editions. Comment: A good example of the reuse of digital assets. One response of note is the use of Adobes Capture OCR utility for the process of creating the electronic edition. This utility produces a document which is a combination of a facsimile of the original and a textual translation (using OCR technology) in the same document. 2 Please give details of the presentation software used for your edition (eg. specialised software, HTML Browser, SGML browser). The majority answer is that the editions presentation engine is a HTML browser, with SGML preferred but not currently viable. Comment: It is likely that this result is due to two factors. The first factor is the availability of HTML expertise due to its use on the World Wide Web. The second factor is likely to be that there are few cheaply available SGML browsers available that support DTDs that are useful for creating scholarly editions. Of note is one electronic edition which uses Adobes PDF format to the apparent satisfaction of its creator. 3 Have you, or do you plan to implement any mechanism in your edition to prevent accidental or deliberate manipulation of the contents of copies of your edition? Majority answer (9/10) is "yes". 3a If the answer to question 3 is "yes". Please give details (ex. publication on CD ROM). For those that replied to this question there was a range of answers including; Comment: The results indicate some concern with this issue, which is further expressed in later questions. However most of the answers were proposed mechanisms and not actually in use. 4 Have you considered or developed any means by which your edition can be enhanced after publication? Majority answer (8/8) is "yes". 4a If the answer to question 4 is "yes". Please give details. The majority of answers received for this question detailed what the editors would like to be able to add to their edition, not how to implement this facility. Only one respondent replied with details on how their edition could be further enhanced. The edition was mounted on a read-only Internet server and they assumed they could make whatever changes they wanted to at any time. Comment: A possible reason for the general misinterpretation of this question is that the participants are users of these systems rather than developers, and do not see how such a system could be implemented. 5 Do you plan to make your electronic edition available for access on the World Wide Web? Majority answer (8/9) is "yes". Comment: Some participants mentioned needing to get permission from the publisher of the hard copy version of the edition. This of course leads into another whole area of contention in the area of electronic editions, the issue of copyright and fair use. The last section of this chapter looks briefly at how copyright concerns may affect the acceptance of the JITM paradigm. 6 Do you have concerns about the accuracy and authenticity of editions published in the digital media? Majority answer (10/10) is "yes". Comment: This strong response indicates the respondents are strongly aware of virtual nature of digital assets (perhaps suspicious might be a better term for scholarly editors). 6a If the answer to question 6 is "yes". Please detail your misgivings. There were many concerns expressed about electronic editions. The most common concerns were; Comments: This could be indicative of the newness of electronic editions and the way people perceive them. There was concern expressed about a number of web-based projects underway, which are solely interested in digitising existing works, so that they can be accessed using web technology. Some scholarly editors saw their lack of authority as being detrimental to the scholarly environment. Several participants expressed the belief that publication of an electronic edition, especially on the World Wide Web, should result in a dynamic growing work. Other participants had concerns about the revision control procedures and longevity of Internet-based editions. Obviously this is a contentious area. 6.1.3 Questions for users of SGMLThis section looks at the prevalence of SGML in the creation of electronic editions and attempts to establish support for the assertion made in this report about the difficulty of implementing the TEI DTD.
1 Do you use the Text Encoding Initiatives DTD? Majority answer (5/6) is "yes". Comment: This is a good sign, but a number of responses indicated difficulty in handling the DTD or finding a browser that could handle the markup. 1a If the answer to question 1 is "no". Which DTD do you use and why? Those not using TEI DTD for the documents indicated that they were using HTML. Comment: There was evidence that some users of HTML were not aware that it is an application of SGML. 2 Do you find the hierarchical nature of SGML inhibiting? Majority answer (5/5) is "yes". Comment: Most answers concerned themselves with the difficulty of understanding the paradigm and markup language rather than the hierarchical nature of SGML. Perhaps this question could have been better worded. 2a If the answer to question 2 is "yes". How do you get around this limitation? Majority answer involved simplifying the amount of meta-data being marked up. Comment: This response could be seen as an indication of the steep learning curve of SGML and the difficulty in proofreading the encoded files. 3 Would you be interested in an SGML-based system for creating electronic editions which had the following characteristics:
Majority answer (7/7) is "yes". 4 Would you like to be kept informed of further developments of the system proposed in question 3. Majority answer (7/7) is "yes". 6.1.4 Demonstration SoftwareThis section is added to determine if the participant is interested in being included in a beta test program for a production version of the JITM system if it is developed further.
1 Are you interested in looking at the demonstration software? Majority answer (8/8) is "yes". 2 Are you interested in being kept informed about the development of this software? Majority answer (9/9) is "yes". Comment: The last four questions are included in this report for completeness. They were included in the survey to attempt to find potential beta site testers for a JITM system if it is developed further. The last four questions in the survey, with their high majority responses in favour of finding out about a new system for the creation of electronic editions, could indicate a level of dissatisfaction with the currently available tools. However the questions were somewhat leading and this interpretation could also be caused by bias. A further survey with a larger number of participants based on the distributed demonstration software would help establish the veracity of the results obtained from this survey. 6.1.5 An Overview of the ResultsThe basis of this comparison is to try and discover how well the JITM paradigm provide for the user requirements as gathered from the survey results? The answer to this question will be covered by looking at several key issues to see if the JITM paradigm provides a solution to these user requirements:
Each issue will now be discussed in detail with reference to relevant questions from the survey and how the JITM paradigm deals with these issues. Authenticity, by far the issue of most concern. The high edition preparation overheads indicated in Questions 1 & 2 of Section 1 and the concerns of authenticity expressed in questions 3, 3a, 6 & 6a of Section 2 indicate that a guarantee of authenticity of the transcription of the original source is of high concern to scholarly editors. The JITM paradigm ensures this authenticity through its abstracted markup mechanism and its use of a Manipulation Detection Code scheme for validating the transcription files. These two features ensure that any perspective of the work must be based on an authentic copy of the transcription file, and further that any set of tagRecords created for a modified copy of the transcription will not work on a true copy of the file. Ease of Use. Questions 2 & 2a of Section 3, and the concern over proofreading of the marked up files, indicate that any paradigm that reduces the complexity of the markup of an edition, and thereby reduces the difficulty of proofreading, will be a system easier to use. The JITM system achieves these goals by allowing the editor to mark up specific aspects of the document without markup from other aspects making the task more difficult. Once an aspect has been encoded into a copy of the transcription file and validated against the DTD for the document, the markup is extracted and stored in tagRecord sets. Many different perspectives of the document can be created from these tag sets without including unnecessary tags. Extensibility. The most innovative aspect of the JITM system is that by separating the meta-data from the transcriptions, meta-data tag sets can be created after publication of the transcriptions. This gives anybody the capability of adding meta-data to the edition, thereby increasing the usefulness of the edition, without having to be concerned about creating and authenticating new states of the transcription. The existing paradigm for the creation of scholarly editions is limited by the finality of publication. The responses of Questions 3, 3a, 3b, 4 of Section 1 and Question 4 of Section 2 and Question 2a of Section 3 indicate that a system that allowed the controlled extensibility of an edition after publication would be a great advantage in this field. Despite the limited size of the survey population it would appear that the JITM paradigm not only handles the major user requirements, but also would extend the capabilities of electronic editions. To prove these assertions further surveying would need to be done with a larger population of users. Since the JITM paradigm promotes a collaborative development environment for electronic editions it should be easy to test these assertions using a single test edition. It could be argued that the process of establishing user requirements by survey is not the best way to establish the real requirements for a proposed computer system and that the survey could be biased towards giving certain results. Therefore the next section looks at the recently released Guidelines for Electronic Scholarly Editions [MLA 1997] to see whether the JITM paradigm fits the specified requirements for electronic editions. 6.2 A Comparison with the MLA GuidelinesThe Guidelines for Electronic Scholarly Editions [MLA, 1997] is a large document going into many aspects of the creation and maintenance of an electronic edition. Since in effect the JITM paradigm is a mechanism for creating electronic editions, the section of most relevance to this discussion is the section of the guidelines concerned with standards for such editions [sect. I, pp. 3-5, MLA, 1997]. This section is broken up into five smaller sections;
This section will discuss the conformance of the JITM paradigm with these aspects of the MLA guidelines. 6.2.1 Character SetAs described in the chapter on system design, the JITM paradigm uses the ISO 646 character set for the encoding of the text of the transcription file as well as any markup to be inserted from a tagRecord. Any characters that appear in the original document that are not supported in this character set are encoded into the transcription file as SGML entities. This is in conformance with the MLA Guidelines. An added feature of the JITM paradigm is that the file format of the transcription file precludes the presence of end-of-line characters within the body of a text element in the transcription. This does not violate the rules of SGML or the TEI which generally ignore white space characters within text elements, but avoids a potential problem with different platforms treating the end of a line differently. 6.2.2 Encoding NormsThe guidelines specifically recommend the use of the TEI DTD for the encoding of editions. They go so far as state that the use of another standard would need to be "fully justified and explained". The JITM system not only conforms with this requirement, but depends on the global " id" attribute of the TEI DTD for its referencing scheme.6.2.3 Meta-dataThis section continues the guidelines support for the work of the Text Encoding Initiative (TEI) by recommending that the edition include a section with meta-data for the edition based on the TEI header. This meta-data should include the following;
Since the JITM paradigm uses the TEI DTD it will include the capability for the editor of the edition to use all aspects of the TEI header with some modifications which will now be discussed. The encoding scheme of the edition, formalised in the TEI header as the " <encoding description>" element, details the level of markup to be found in the document. This cannot be determined in the JITM paradigm prior to the selection of the appropriate tag sets by the user, and so this requirement of the guidelines cannot be adhered to as expected by the guidelines. However since the header file for the edition is maintained as a separate file under the JITM paradigm, a JITM system could modify the header file on the fly to include the details of the encoding scheme to match the encoding of the perspective being generated.The other modification concerns revision control. This is represented in the TEI header by the " <revision description>" element. This element is irrelevant in the JITM paradigm, but to conform with the DTD could be used in a method similar to that described above: to insert, on the fly, details of the tag sets used to create the perspective being created.With regard to revision control, a sub-item in this section of the guidelines specifies the necessity for including the means to authenticate a file in the development of the edition. This is one of the major strengths of the JITM paradigm. 6.2.4 Support for other mediaThis section deals with the capability of the edition to support different non-proprietary media types. Since new tag sets can be created to support new media standards as they become available, the JITM paradigm has built-in extensibility in this area. A simple example is given in the guidelines, involving the capability to add line breaks into the transcription so that the text can be readily compared to an added facsimile image of an original page of a state of the edition. In a JITM system, the physical layout of the page (i.e. lineation and pagination) would be one of the first tag sets recorded for an edition. This information is normally recorded in the transcription process to assist in the correction of the transcription. In a JITM system it would then be extracted out and stored as a tag set. 6.2.5 Archival FormatThe guidelines recommend that the archival format of the edition should be non-proprietary, and both platform and software independent. The JITM paradigm uses such standards as to be non-proprietary and the file format of the transcription file is designed to be platform non-specific as mentioned above. The specification to be software non-specific is harder to comply with, since the strength of the JITM paradigm lies in the tools used to create the different perspectives of the work. However if these tools are not available, the transcription file is still useful in its own right, as it is an authentic transcription of the original document with very little extra encoding added to the text, and so would be very easy to re-purpose. This section also deals with the responsibility of maintaining a "master digital archive" of the edition. This is essential for any electronic edition and is as easily implemented in a JITM system as any other. The JITM paradigm has some advantages in this area, because of its built-in authentication scheme. Any transcription file can be authenticated as a true copy of the master digital archive as long as its calculated MDC matches that found in a tagRecord generated from the original master. This feature should go a long way towards reducing the potential proliferation of variant states of the edition. This comparison of the JITM paradigm against the MLA guidelines has been favourable. The only aspects where it does not conform to the guidelines are areas where the current paradigm is weak; and a JITM system could be designed to conform to the guidelines for a specific perspective. The JITM paradigm also has some extra capabilities which give it advantages over the guidelines recommendations, while still maintaining the integrity of the transcriptions of the original states. Despite these advantages, the acceptance of the JITM paradigm is not guaranteed due to the change of emphasis placed on different parts of the information used to create a JITM system electronic edition. The next section deals with possible obstacles to the acceptance of the JITM paradigm by scholarly editors. 6.3 Problems with the Paradigm ShiftThe current paradigm, its principles based on the physical publication of a book, freezes the editions development at publication. If this is not done then there is the risk of the proliferation of multiple variant states of the electronic edition. The JITM paradigm is different in that it only freezes the transcription of the contributing states of the work at publication, and allows further development of meta-data applicable to the work by any number of people. Whether the reader believes that the JITM paradigm is better than the current methods for creating electronic editions or not, the acceptance of a JITM system is based more on the acceptance of the paradigm shift than the software technology. 6.3.1 Issues of OwnershipThis section looks briefly at the issue of copyright under the JITM paradigm. The author claims little experience with copyright laws, but is aware that the laws on copyright do not adequately cover normally created digital documents. Furthermore, in a JITM system, the issue of copyright is even more complex than in static digital documents. The JITM paradigm logically separates the original transcribers editorial content from the transcription. The ownership of the component parts of an edition now becomes an issue. There is a strong case that the transcription files, being an exact transcription of the original state, may not be considered sufficiently altered by the transcriber, for them to be able to claim their own copyright on the material. In this case, the transcription files would either be in the public domain or owned by the holder of the copyright of the original author and therefore the creator of the edition would not be able to claim royalties for the transcription files. Traditional methods for creating editions add extra material into the text of the transcription such that the editor can claim their own copyright on the finished product. It follows that the creator and or publisher of the edition may only be able to claim copyright over their meta-data tag sets, and therefore may only be able to charge royalties for the use of these tag sets. Since the selection of tag sets is optional in a JITM system, this means that the meta-data of the transcriber could be replaced with some other tag sets and the creator would then have no claim for recompense over the users of the transcription file. The perspective documents created in a JITM system are another area of concern with regard to copyright issues. Since a perspective is a virtual document created for displaying particular aspects of the transcription files of the edition, and is temporary in nature, could this fall under fair usage for scholarly research and therefore be exempt from any claim for royalties? What happens in the case where a perspective is generated from meta-data from a number of different sources? How is the copyright of a perspective attributed when it is an amalgam of different peoples intellectual content? In fact since it is only part of the tagRecord that is incorporated into the transcription file, does the creator of the tagRecord have any rights at all over a perspective document? The crux of this issue (i.e. the acceptance of the JITM paradigm by the community of scholarly editors) is based on the financial aspects of publishing an edition. Firstly, the JITM paradigm does not fit with the current system used for copyright of printed material. Secondly, the issue of ownership of the transcription files could also make it less profitable for potential publishers of an edition, since they may not be able to charge royalties for material in the public domain, or in the worst case have to pay royalties themselves to the owner of the copyright of the original state. These issues make it an unattractive proposition for any publishing house to become involved in the publishing of an electronic edition that uses a JITM system, and so it is likely that the creator of the edition will have to take on the burden of publishing themselves. This might deter potential users from using a JITM system. A possible solution to this problem is that the transcription files be made available free of charge, but that the meta-data tag sets be licensed for use like computer software. The user would pay a one-time fee, which gives them unlimited usage of the meta-data. Users of a JITM system could purchase the tag sets they were interested in, or if desired create their own and sell them through a broker to gain recompense for their efforts. This brokerage system could be associated with a network archive site maintained by the editions creator, which held reference copies of the transcription files. The distribution of tag sets could all be done electronically at minimal cost and therefore may be a cost effective commercial venture for the editions original creator. One final point that should be raised is that in the JITM paradigm perspective views are suppose to be temporary documents. The permanence and perhaps ownership of a perspective view of a work is dangerous and should be discouraged without proper indication that the perspective should not be seen as an accurate transcription of the original work. The danger lies in the fact that the authenticity of a perspective cannot be guaranteed in perpetuity. Under the JITM paradigm the verification of a perspective occurs during its creation and authentication cannot be guaranteed after this event. Having long-lived perspectives of a work risks their being used as an accurate transcription of the original work. Basing further work on a JITM perspective file which cannot properly be authenticated risks the creation of a new biased state of the original work. 6.3.2 Demoting the EditorThe JITM paradigms devolving of the responsibility for the edition away from a specific editor could either "make or break" the acceptance of the new paradigm by the scholarly editors themselves. After all, the paradigm makes the transcription files the most important element of the edition, reducing the editorial contribution to a set of optional files which can be replaced with other tag sets if desired. While this facility of the JITM paradigm may be good for the edition, in that it allows the edition to be a continuously developing resource, it does down play the contribution of the original creator of the edition. This demotion of the original creator to the level of just another supplier of optional meta-data could find resistance amongst scholarly editors who perceive the creation of an edition as a way to immortalise themselves in "print". Also, considering the amount of work that goes into the creation of a fully annotated critical edition, it is understandable that any editor could become discouraged to realise that all their efforts can be ignored by the user of the edition. Another aspect of the JITM paradigm is that methods of citation for editions created for a JITM system would have to be rethought. A citation would presumably have to include information so that anyone could use it to recreate the perspective that it refers to, while still being brief enough to be understandable and manageable in the document. This could be difficult as it would involve specifying all the tag sets used to define the perspective. This problem does not have a ready solution if the JITM paradigm does not become widely accepted. These are all reasons why the more traditional methods for creating electronic editions might be favoured over a JITM system by potential edition creators. However the reduced time to "publication" of the transcription files, and the unlimited extensibility of a JITM system edition, should make the paradigm attractive to those editors who feel limited by the current systems, which are still bound by the same limitations as a physical edition. |