The following dissertation was submitted in part fulfilment of the requirements for the degree of MA in Linguistics & ELT at the University of Leeds, UK, in 1998. No part of this work may be reproduced without the author's full consent.
Comments and requests: here.
Observation of Japanese learners of English gives the impression that the article system is never adequately mastered, and that errors persist into the advanced levels of proficiency. In addition, female learners appear to perform better than their male counterparts. The current study sets out to investigate the performances of 44 adult learners in Japan using a questionnaire consisting of an error-recognition test and a written narrative task. The results were analysed according to Huebner's adaptation of Bickerton's 'semantic space' method of evaluation, and rates of accuracy compared statistically to measured overall proficiency levels. In addition, data was analysed in order to determine specific types of article error occurring within each specific NP type.
The findings show that learner performance is considerably better for written narrative than for error-recognition. In the latter task there was a positive correlation between article accuracy and proficiency level, but no such correlation was found for the former, as had been predicted.
Both tasks revealed that learners had fewest problems with definite references based on assumptions of shared knowledge. Most problematic in the error-recognition task were indefinite articles as first mentions of NPs. For the narrative task it became apparent that learners encountered great difficulty in marking both first and second mentions of NPs, with the definite article for the latter proving especially problematic when remote in the discourse from the original entity.
With regard to gender differences, females out-performed males in both tasks, relative to their proficiency levels.
The study concludes by suggesting a context-based experiential methodology for teaching the articles in Japan, in which function is stressed rather than the memorisation of abstract rules.
Any teacher of English as a foreign language in Japan is aware that his or her learners make a number of recurring, predictable, and seemingly intractable errors. First and foremost amongst these is the frequent misuse of the article system which appears to be a major stumbling block for Japanese learners, and which seems to pervade their output irrespective of proficiency level. As Swan and Smith (1987:218) have pointed out :
"Many Japanese learners achieve really creditable proficiency in all aspects of written English, except for articles and the number-countability problem..."
The root of this problem might logically be assumed to be the obvious differences between English and Japanese, the latter being a language lacking an article system (Kuno 1973:26), and being context-dependent in structure, rather than syntax-dependent as is the case with English (Mizuno 1985:9).
The persistence of errors with articles has been alluded to by a number of writers (Tarone 1985:377, Mizuno 1885:10), despite the fact that the study of English in Japan is widespread and that a comparatively high level of passive knowledge of English grammar is a prerequisite for the national university entrance examination. Indeed, all Japanese must study English for a minimum of three years at school, whilst 90% go on to study for six years, and 50% for eight years at schools, colleges and universities. (Swan and Smith 1987:222).
Yamada and Matsuura (1982:51) conclude that although Japanese learners are taught the rules governing the usage of English articles, "...they cannot use articles correctly simply because they cannot identify given items by means of semantic notions."
This inevitably leads one to suppose that the kind of teaching with respect to articles is largely ineffective. Native speakers teaching in Japan are aware of the problem but do not usually endeavour to teach much grammar as the learners are deemed to have already acquired it at school, and in any case there is little material available for the purpose. (Swan and Smith 1987:218).
The purpose of this study is therefore centred around the following points:
The investigation centred around the following assumption, expressed as a hypothesis together with the corresponding null hypothesis :
In addition, a secondary hypothesis and corresponding null hypothesis were devised, relating to performance and gender :
The reasoning behind predictions (1) and (2) are as follows : despite the perceived difficulty of Japanese learners in article usage at all proficiency levels, the natures of the two tasks are likely to yield very different results. One would expect the results of the passive test to be more in line with the hypothesis, since the skills needed for the test are essentially receptive, and therefore it would not be unreasonable to expect that more proficient learners with a greater awareness of the abstract rules governing the article system (but not necessarily a greater ability in using them) were able to correctly identify more errors than lower level learners.
With regard to the written narrative test, the prediction that there will be no correlation between proficiency level and accuracy of article usage is based upon the following notion : as Japanese learners are less accustomed to using the productive skills, and because they do not appear to be aware of the full range of contextually-dependent functions of the article system (as opposed to the abstract set of rules they memorise at school), then learners at all proficiency levels are likely to make similar errors. These errors are additionally reinforced by interference from Japanese, so that a general 'default' strategy is employed in which articles are, for instance, simply omitted to circumvent the problem.
With regard to the second hypothesis, the notion that women are more likely to out-perform men at the same proficiency level is based upon classroom observation and the idea that for cultural reasons Japanese women are more at ease than men in the language classroom (Swan and Smith 1987:213). The phenomena may also be more widespread, since a number of studies relating to learners of varying nationality have encountered similar gender-based variance (Green and Oxford 1995:265).
Of common concern to all intent upon researching article error has been the difficulty of arriving at a suitable methodology and criteria for analysis, given the perceived complexity of the English article system.
Quirk (1972:149) summarises the range of use for the three articles [a/an, the, ø] thus:

Figure 1 : Quirk's categorisation of article usage
However, such neat schematic explanations do not take into account the fact that the use of articles is determined by both form and context, a potential source of difficulty for both learners and the researcher intent on analysing data for errors. (George 1972:94).
For example, consider the following sentences as responses to a test in which a subject is asked to write a description of what he or she sees in a picture :
(a) The woman is sitting on the bench.
(b) The woman is sitting on a bench.
(c) A woman is sitting on a bench.
(d) A woman is sitting on the bench.
At first glance, it seems that (c) is the response that best fits the requirements of the situation, i.e. that the two NPs as entities mentioned for the first time should be marked as such by the use of the indefinite article (what Brown and Yule (1983:169) refer to as 'new' information, as opposed to 'given', which is the realm of second mentions and is characteristically marked by the definite article). This is what, according to Quirk, would constitute a specific reference with indefinite form.
Does this then mean that the other responses should be assessed as incorrect? In actuality, a case can be made out for all of them. Emslie and Stevenson (1981:326) note that sentences such as (b) might mark the first NP with the definite article because:
"It is a common stylistic device in story-telling to use an initial definite reference to focus the listener's attention on a character who will be central to the story being told."
The first NP would therefore now be a specific reference with a definite article, according to Quirk.
In order for (a) to be acceptable we must turn to the notion of shared or background knowledge. The use of definite articles for both of the previously unmentioned nouns could be explained if the subject believed, quite logically, that the contents of the picture he or she was writing about were known equally well to the researcher.
An alternative explanation for (a) is that the writer is using the definite articles deictically; in other words they are assuming the role of demonstratives in the sense that 'the woman' means 'this woman here', in what Clark and Marshall (1981:37) term 'mutual knowledge based on physical copresence', in this instance 'prior', since the writer is assuming that at some time previously the researcher had viewed the picture and was thus aware of its contents.
Here we begin to see the inadequacies of Quirk's categorisation as far as our purposes are concerned : all three explanations for why the initial NP might be marked by a definite article would be designated simply as 'specific referents with a definite form', a definition which completely fails to distinguish their different functions.
Our final example (d) is a little less straightforward, but can be interpreted thus : 'the bench' is definite because it is assumed to be background knowledge and/or being referred to deictically as above, and the indefinite article for 'woman' is used in this instance in the sense of 'one', that is, as a quantifier.
What this means is that any attempt at article error analysis needs to be contrastive, whereby the researcher must reconstruct the message of the subject in a plausible way, since notions of form and context mean that a sentence could be grammatically acceptable but still erroneous because of the environment in which it appears. Learners' sentences must therefore be both acceptable and appropriate. (Corder 1981:37-40).
One attempt to categorise the articles according to semantic function was the system put forward by Bickerton and subsequently adapted by a number of writers in order to assess data such as that generated by the present study. Bickerton noticed in his work on Creole languages that these language types invariably divide NPs according to notions of specific/non-specific, which he argues is an innate division that has significant implications for the language learner (Bickerton 1981, cited in Adamson 1989:30). Since English provides no such clear-cut marking of specific/non-specific items (generics, for example, may take any one of the three articles), and are in fact governed by the additional criteria of supposed-known-to-listener and supposed-unknown-to-listener, Bickerton (1981:249) proposed the following 'semantic space' for English articles :

Figure 2 : Bickerton's 'semantic space' for the article system
This set of criteria was later adapted by Huebner (cited in Parrish 1987:363-365) into "...a system of analysis that accounts for article use in all contexts, that is to say, in all pre-noun positions," and utilised in a number of studies including Parrish (1987), Tarone and Parrish (1988), Adamson (1989), Thomas (1989) and Kubota (1994).
Huebner's system of classification was thus adopted for the present investigation, but has been modified slightly in order to provide information on the particular article used with reference to each semantic category. The following definitions and examples are from Parrish (1987:364-365) :

Figure 3 : NP classification system (SR= specific referent, HK= hearer knowledge)
According to this set of criteria, reformations, repetitions, proper nouns, and NPs in series ("the men and the women") are discounted from the data.
In addition to the eleven categories listed above, the present study adds an extra category, 12, consisting of expressions which are to all intents and purposes formulaic memorisations ("in the morning", "go home", etc.).
Each NP was also classified according to the appropriate article necessary for the context, where 'd' is the definite article, 'i' the indefinite, and 'ø' the zero article. Thus, the sentence "a woman is sitting on a bench", containing two NPs, would receive the following classification : 6i ; 6i. For further details, refer to the 'Method' section of this paper (page 20).
Studies on article error have tended to be concerned solely with children, or have been longitudinal studies of acquisition. Of these, only a handful deal specifically with Japanese learners of English.
Hakuta conducted a longitudinal study of a five year old Japanese girl whose family had moved to the United States, and analysed oral data relating to the acquisition of articles. The study crucially failed to include the zero article, but showed that the definite article was acquired earlier than the indefinite. Hakuta noticed that as the subject's use of the articles steadily increased, there were many errors with specific/non-specific distinctions (1976:339), the persistence of which he puts down to the fact that "...Japanese does not have obligatory linguistic devices..." to mark such divisions (1976:338).
Yamada and Matsuura, on the other hand, chose to look specifically at adult learners of English at two different levels of proficiency ; intermediate and advanced. Using a cloze test they demonstrated that overall accuracy rates were about 70%, with little variance between the two proficiency levels (1982:50). The results indicated that the intermediate group found the use of the definite article the least problematic, followed by the indefinite and the zero ; for advanced learners the order reversed the last two items, indicating that specific references were easier to comprehend than non-specific (1982:61). The commonest error recorded was the use of the zero article in place of the definite, with overspecification of nouns being commoner than underspecification.
They conclude with the comment that unless learners are taught to pay more attention to the article system, "...the chances are that the students will not improve their article deficiency noticeably." (1982:61).
On the whole, the study suffers from only employing one task type, the cloze test, which seems to be a somewhat artificial exercise in that it is not centred around productive output. In addition, the lack of rigorous criteria to assess the data for accuracy may cast doubt upon its findings : the writers imply that they marked as incorrect instances where the subjects had marked NPs with definite articles, "...presupposing that both the writer and reader already know the nouns referred to in the text." (1982:59). As we saw in our earlier discussion, these may not necessarily be grounds upon which to discount an answer's validity.
Tarone introduces the additional elements of task variation and a comparison between the performances of Japanese and Arab students. The study involved subjects taking (1) an error-recognition test, (2) a narrative task involving a picture sequence, and (3) an informal interview by a native speaker of English (1985:377-378). The results showed no overall difference in the accuracy rates of both nationalities, but both exhibited poorer accuracy in the error-recognition test compared to the narrative. This went against the expectations of the writers, who had predicted that the 'looser' style needed for the narrative would have resulted in less attention to the articles and a consequent poorer rate of accuracy (1985:385). The importance of the study, however, is that it serves to illustrate the effect different task types have on learners' performances.
Mizuno's detailed longitudinal study involved a large sample of 350 Japanese high school and college students who were assessed by means of a cloze test, an error-recognition test, a word-rearrangement test, and a written composition. In addition, subjects were proficiency-tested and grouped into nine levels of ability (1985:11).
Mizuno found that beginners made frequent errors, some of which disappear as they progress (articles placed in the wrong sentential position), and some of which persist into the advanced stages (article omission, overgeneralisation, and substitution errors) and appear never to be fully mastered (1985:9-10).
Parrish also undertook a longitudinal study, this time of a single Japanese learner in which we see the appearance of Huebner's adaptation of Bickerton's 'semantic space' criteria for analysing NPs (see page 13). She found, in line with previous studies, that the definite article is acquired before the indefinite, with data for the zero article proving difficult to assess, since correct usages of 'ø' might in reality be due to the subject's tendency to omit articles as part of a communication strategy (1987:374). She concludes by noting that the article use of her subject, though frequently erroneous, was systematic rather than random, being "...governed by the semantic functions of NPs, lexical categories and attempts to keep linguistically related forms consistent with one another." (1987:381).
Subsequent studies have tended to use the same, or closely related methodologies. Tarone and Parrish re-evaluated the data from Tarone's earlier study (see page 15) using Huebner's system. This showed that task type influenced the kind of NPs employed, and hence the patterns of article use (1988:33). In general, accuracy in the field [+SR],[-HK] tended to be lower than that of [-SR],[+HK] and [+SR],[+HK]. Furthermore, the latter field was shown to exhibit variance in accuracy across different tasks due to differing communicative demands : for example, this type of NP was produced with only 50% accuracy in a grammar test, but with 90% accuracy in a narrative task. Tarone and Parrish explain this somewhat surprising result by stating that :
"It is simply more important for the speaker to mark such NPs accurately in producing an effective narrative than it is to attend to the same forms in a sentence-level grammar test." (1988:34).
In the more recent work of Kubota, the same system of analysis is retained in a study which seeks to compare Cziko's work (1986, cited in Kubota 1994:1) on child acquisition in which it was found that the definite article is overgeneralised in first mention contexts. Kubota's study uses a cloze test and written compositions in a longitudinal study in which article accuracy is not related to proficiency level. Accuracy rates were found to support Tarone and Parrish's ideas on task variance, and, in keeping with other studies, the definite article in the field [+SR],[+HK] proved to be the least problematic, with the generic [-SR],[+HK] proving least accurate. Kubota asserts that his data supports Cziko's findings since the definite article is once again shown to be overgeneralised, thereby implying that learners initially associate the definite article with [+SR] (1994:23).
Master conducted similar investigations, but with subjects from a variety of L1 backgrounds in order to address the question of L1 interference with respect to languages with and without article systems. His results showed, as one would expect, that learners whose languages lacked articles faired worse than those whose languages possessed them. In addition, the former subjects omitted articles more frequently, confirming the effects of L1 interference for languages such as Japanese (1989:349).
Three studies on speakers of languages other than Japanese support the idea that learners whose native languages lack comparable article systems are likely to encounter similar difficulties in their acquisition of the English article system. Both Kharma (1981:341) and Agnihotri (1984:117) found that Arabic and Indian learners were most successful with the definite article, but poorer in using the indefinite and zero articles, whilst Adamson (1989:42) notes that Korean learners "...can have a high overall proficiency in English without the same level of proficiency for each individual morpheme." These findings all concur with those relating to Japanese learners.
Whilst there exists considerable literature on gender differences in general, few deal specifically with second language learning, and none with article error.
Green and Oxford (1995), however, have looked at learning strategies in relation to both gender and proficiency level. They found that in general women tend to use more strategies than men, although these strategies are not necessarily the ones associated with more successful learners. In other words, they found no evidence to show that women are better learners of languages than men (1995:290).
They note that although gender differences which have been observed in many cultures are likely to be biological or social in nature, they might nonetheless "...have a real, if subtle, effect in the language classroom." (1995:266).
Whilst Green and Oxford would seem to be suggesting that men and women merely go about language learning in different ways, personal experience in the EFL classrooms of Japan indicates that, for whatever reason, females are more successful in terms of communicative abilities, pronunciation, the use of idiomatic expressions and finer grammatical points such as the articles.
In order to make explicit the context for our experiment, we will briefly restate our hypotheses. In general, we were interested in testing the assumption that for Japanese learners of English there is a positive correlation between proficiency level and the ability to recognise and use articles correctly. We predicted that, in terms of passive ability, the data would support such a notion. However, we also predicted that in terms of active written production, there would be no such correlation, and that the data would therefore tend to support the null hypothesis which states that there is no correlation between proficiency and accuracy of article usage. Furthermore, a secondary consideration concerned the differences in performance between male and female subjects, with the expectation that male performances with articles would be worse than those of females in relation to language proficiency.
The experiment was therefore devised in order to elicit data for the two variables, level of English proficiency, and accuracy with articles, from a set of subjects in a related design. The subjects completed a two-page questionnaire (see Appendix A, page 59) which included a brief section on personal information, a passive test designed to ascertain ability to recognise correct and incorrect article usage, and an active test which required the production of sentences to describe a pair of pictures, covering the data for the second variable. For the first variable, language proficiency, it was proposed to utilise pre-existing proficiency scores deriving from an oral test employed by the institution from which the subjects were drawn (the OPPT test - see Appendix B, page 62).
Whilst it would also have been useful to have conducted an additional oral test with regard to the elicitation of data for the second variable so as to allow comparisons between accuracy of article usage in both written and oral production, this was rejected on the grounds of practicality, since neither the time nor the logistics for the systematic recording of a large number of subjects were available to the researcher. The object of the experiment was thus to collect a range of data for both variables in order that a statistical analysis could be conducted to check the validity of the hypothesis, and in order to ascertain detailed information concerning the specific grammatical situations in which learners encountered problems.
Data was collected from the questionnaires completed by 44 Japanese learners of English enrolled at two branches of ECC Foreign Language Institute, a private English conversation school in the Japanese cities of Hiroshima and Kyoto.
The subjects break down as follows :

Figure 4 : Subjects according to age, sex, location and occupation
It is interesting to note that, as seems to be the case in general in Japanese English conversation schools, the number of female subjects outweighs the males. In addition we can see that the females were on average more than six years younger than the males, and were comprised of substantial numbers of both students and workers, whereas males were almost exclusively workers.
Although for purposes of statistical analysis an equal number of males and females would have been ideal, this was not possible to arrange due to the logistics of the data collection, a point discussed further below.
The method of data collection involved a two-page questionnaire, an example of which can be found in Appendix A (page 59) in both Japanese and English translation.
The precise nature of the inquiry, to collect data on article error, was not explicitly mentioned so as not to influence the performances of the subjects. In addition, the subjects were asked not to consult dictionaries or textbooks during the exercise, nor to collaborate with anyone else, in order to ensure as much as possible that the data was a true reflection of the subjects' individual unaided ability.
The test itself comprised of two parts, one designed to assess the subjects' passive abilities in article error-recognition (the passive test, or 'PT'), and the other to assess active abilities with articles in the production of a short written narrative (the active test, or 'AT').
The passive test consisted of the following ten sentences, preceded by Japanese instructions inviting the subjects to mark each sentence right or wrong according to whether they could detect any grammatical errors. Sentences deemed to be incorrect were additionally to be marked by a circle around the part of the sentence where the error was considered to have occurred.
|
(1) Manchester is a city in north of England. (2) Michael Jackson is a famous American singer. (3) I went to British Embassy to get a new passport. (4) She lived in large house near the river. (5) My mother put the apples into the basket. (6) It's not good idea to drink so much wine. (7) The swimming is healthy, but boxing is not. (8) Every Friday I go to supermarket to buy groceries. (9) John listens to the radio in the morning. (10) If the weather is fine, I'll go to the beach. |
Figure 5 : Sentences used in the passive test
Of these ten sentences, numbers (1), (3), (4), (6), (7) and (8) each contained one article error, the remaining four sentences being correct. The errors broke down into the following types :
|
(a) Omission of the definite article - (1). (b) Omission of the indefinite article - (6). (c) Use of the definite article in place of ø - (7). (d) Omission of the definite or indefinite article - (3), (4) & (8). |
Figure 6 : Article error type in the passive test
In addition, half of the questionnaires featured the questions in a different sequence, namely (5), (6), (10), (7), (1), (2), (4), (9), (8) and (3). This was to counteract any possible ordering effects.
The second page of the questionnaire consisted entirely of the test of active ability. This included brief instructions in Japanese inviting subjects to compose two sentences about each of the two pictures presented (See Appendix A, page 59). Subjects were informed that the pictures formed part of a story, were therefore connected, and that the sentences could take any form.
The rationale behind these particular design features was that it was felt to be important to make subjects aware of the relatedness of the two pictures, and that the objects portrayed in each (principally a young woman, a dog, two men, a cassette-player and a bench) were the same. This is because, as has been pointed out by Power and Dal Martello (1986:148), the objects need to appear at least twice in order to give the opportunity for the production of data in the important area of article usage as markers of first and second mentions. The pictures themselves were taken from a narrative task devised by Swan and Smith (1987:264).
A written task was chosen both for reasons of practicality (the time and access to subjects necessary for a recorded oral test were unavailable), and because, as Master (1990:464) notes, written tests are generally harder than spoken ones, thereby making them better tests of ability.
The subjects were encouraged to compose any kind of sentences they saw fit for two reasons : (i) so that they were free of restraints and would therefore produce sentences that were entirely natural to them, and (ii) so that the test could be undertaken by subjects of any proficiency level equally well. It was envisaged that lower-level subjects would concentrate on simple descriptive sentences, whilst those of a higher level would embellish their efforts with a wider range of stylistic devices. In either case it was expected that the production of a range of output would occur in which, according to the rules of English, either the definite, indefinite or zero articles would normally be required.
Proficiency scores were provided by the OPPT (oral proficiency placement test), a comprehensive set of questions developed and used exclusively by ECC Foreign Language Institute for ascertaining the levels of students, for class placement, and for the periodic monitoring of individuals' progress. The test consists of graded sets of questions labelled 5b, 5a, 4b, 4a, 3b, 3a, 2b, 2a and 1b respectively, corresponding to the proficiency levels used by the school (See Appendix B, page 62, for details of the OPPT test).
An initial trial run of the materials was undertaken in February 1998 in Leeds, during which six female postgraduate students of the university (3 Japanese, 1 Taiwanese, 1 Hong Kong Chinese and 1 Korean) completed both tests. The results proved satisfactory, as the participants reported no ambiguities or misunderstandings with regard to the instructions, which for the occasion had been presented in both Japanese and English to accommodate the non-Japanese taking part. In addition, the active test produced the expected range of article usage, and was thus unchanged prior to being sent out to Japan.
The data was collected during March and April 1998 by staff members at the two schools, who were given a number of explicit instructions regarding the conducting of the tests, together with a sample questionnaire to serve as a guideline.
Furthermore, the staff were requested to insure that the questionnaires were filled in either before or after classes on the school premises rather than allow them to be done in learners' homes : the rationale behind this was the desire to prevent learners from consulting others or referring to textbooks, dictionaries and so forth, which would invalidate the results as their own spontaneous and individual output.
The staff were instructed, upon receipt of completed questionnaires, to fill in the learners' OPPT scores from the school records in the appropriate space at the end of the second page.
In general, the questionnaires seem to have been completed in the manner set forth above. However, there were a number of problems with those from the Hiroshima school : ten questionnaires completed could not be used, since the learners had not taken the OPPT test and the staff could only approximate their proficiency levels. Two further subjects had failed to complete the sentences for the second picture in the active test : however, their responses for the first picture still provided valid data which could be included in the sample.
From the Kyoto school, one subject had only completed the passive test : thus the overall sample for the active test was 43, one less than for the passive. Additionally, a further ten questionnaires which had been completed later than the others at the Kyoto school and could not be collected in person by the researcher, were subsequently lost in the post. Thus, despite a total of 64 questionnaires being completed, only 44 could be used for analysis.
The raw data from the usable 44 questionnaires collected was processed in the following ways.
The OPPT scores were converted to a scale more suitable for statistical analysis : 5b = 1, 5a = 2, 4b = 3, 4a = 4, 3b = 5, 3a = 6, 2b = 7, 2a = 8, 1b = 9 and 1a = 10. Contrary to the original system, it was deemed necessary to mark the lowest level as 1 and the highest 10 in line with the main premise of the investigation, namely whether or not higher levels of proficiency are related to greater accuracy with articles.
The passive test scores were determined in the following way : the subjects' decisions on the grammatical correctness of the sentences were matched against the correct answers and a score out of ten was arrived at. It had initially been envisaged that marks would be determined by two criteria, since the subjects had been instructed both to mark the sentences right or wrong and to further circle the part of sentences which contained a perceived error. This measure was intended to prevent occasions where a subject might have arrived at the correct answer but for the wrong reasons. For example, subject no.39 marked sentence (7), 'She lived in large house near the river' as being incorrect, which was the right answer, but circled the word 'near' as being the source of the error, when it was in fact the lack of a definite or indefinite article preceding the NP 'large house'.
However, due to the ambiguities of the circling method (it was in many cases unclear what exactly a subject was indicating) and the fact that five subjects had either partially or completely failed to comply with the instructions and had not circled anything, it was decided to abandon specific error identification and opt for the more straightforward right/wrong judgements as the sole criteria for determining scores.
The ten sentences of the passive test were also categorised in terms of NP types according to the criteria outlined earlier (see page 13) so that it could be determined which types of article error subjects had the most difficulty in recognising. Account was taken of the fact that the erroneously marked NPs in sentences 3, 4 and 8 could be categorised in more than one way, since both the definite or the indefinite article could be used to correct them. It was intended that the results of the passive test be of value as a comparison with those of the active test, and an indication of the general awareness of the subjects with regard to articles.
The active test scores required a more complex means of calculation, since the free production of sentences obviously meant that different subjects would produce different amounts of language.
The following procedure was therefore adopted in order to produce meaningful scores expressed as percentages :
Each major set of data obtained, such as the OPPT, PT and AT scores, together with figures for average scores over both PT and AT tests were subjected to the standard descriptive statistical analyses.
In addition, in order to test our hypotheses, the PT and AT scores were both compared to the OPPT scores using the Spearman Rank Correlation Coefficient. This non-parametric test was chosen since the data did not meet the requirements of a parametric test : the OPPT data was based on categories rather than actual scores, some of the data sets exhibited skewed rather than normal distribution (especially for the AT scores), and the variance of the scores was not homogenous, the AT scores being based upon a different scale to that of the PT and OPPT.
The test was deemed to be one-tailed for the OPPT and PT analysis and that of the combined PT and AT scores, since the object of our concern was to test for the existence of a unidirectional correlation in which high PT scores were associated with high OPPT scores. The OPPT and AT analysis was, however, performed as a two-tailed test, since here we were testing the notion that there was no particular correlation at all between the two variables.
Additional analysis was carried out to investigate the differences in performance between male and female subjects, and on the frequency of the different kinds of article error which were found in the data.
The experiment yielded the following sets of data :
The raw data relating to all of the above can be found in Appendices C and D (pages 64 and 65).
Figure 7 : Descriptive statistical analysis of OPPT scores
As can be seen in Figures 7 and 8, the OPPT scores exhibited a relatively normal distribution with a large number of intermediate level subjects and comparatively few at advanced and beginner levels.
Figure 9 : Descriptive statistical analysis of passive test scores
Figure 10 : Distribution of passive test scores
The passive test scores also reveal a normal distribution in which the majority of subjects' scores centred around the median.
Figure 11 : Descriptive statistical analysis of active test
scores
Figure 12 : Distribution of active test scores
The results of the active test exhibit a strongly skewed sample in which an unusually high number of subjects achieved maximum scores.
Figure 13 : Descriptive statistical analysis of combined passive and active test scores
Figure 14 : Distribution of combined passive and active scores
The combined scores of the passive and active tests, which were calculated as the mean of the two scores, show a distribution in which mean, median and mode coincide almost perfectly at around 70%.
Figure 15 : OPPT, PT and AT scores according to sex
Figure 16 : Male and female test performances
Figures 15 and 16 indicate that whilst male subjects had on average a slightly higher proficiency level according to the OPPT than the females, and performed marginally better in the passive test, the female subjects scored considerably higher in the active test.
The OPPT and PT scores were analysed using the Spearman Rank Correlation Coefficient, and the data points plotted on the following scattergram :
Figure 17 : Scattergram of OPPT scores (X) and PT scores (Y)
There was found to be a significant positive correlation between the OPPT scores and PT scores in a one-tailed test where p= 0.0008, corrected for ties. The null hypothesis could thus be rejected.
The OPPT and AT scores were analysed using the Spearman Rank Correlation Coefficient, and the data points plotted on the following scattergram :
Figure 18 : Scattergram of OPPT scores (X) and AT scores (Y)
There was found to be no significant correlation between the OPPT and AT scores in a two-tailed test where p= 0.2417, corrected for ties. The null hypothesis could thus be accepted.
Using the same Spearman Rank Correlation Coefficient test, the OPPT proficiency scores were compared with the average of the two test scores, and plotted according to the following diagram :
Figure 19 : Scattergram of OPPT scores (X) and combined PT and AT scores (Y)
There was found to be a positive correlation between OPPT and combined PT and AT scores in a one-tailed test where p=0.0006, corrected for ties. The null hypothesis could thus be rejected. (Full details of the calculations for the inferential statistics can be found in Appendix E, page 66).
The five types of article error, classified according to our criteria, which appeared in the fixed sentences of the passive test were analysed in terms of the frequency of their failed recognition by the subjects :
Figure 20 : Distribution of error in the passive test
Figure 21 : Frequency of failed error recognition in the passive test (1=5d, 2=6i or 5d, 3=9i, 4=1ø and 5=6i, 5d or 1d)
From the above diagram it is clear that subjects' errors were fairly evenly distributed over the NP types in the sentences of the passive test.
In addition, errors were analysed according to their distribution between the sexes, producing the following chart :
Figure 22 : Passive test - sentence no. and NP type article error according to sex
Results show, with the exception of sentence 4, considerable variance between the sexes with males out-performing females in sentences 1 and 8, and females out-performing males for sentences 3, 6 and 7.
The data regarding article errors and the NPs in which they occurred is as follows (the full data being available in Appendix D, page 65) :
Figure 23 : NP type and article error in the active test
The salient points arising from the data can be better appreciated in the following bar chart, which for each major NP type occurring in the data, compares the frequency of occurrence to the frequency of error :
Figure 24 : NP types : frequency of NP occurrence compared to frequency of error
The bar chart above illustrates that there were a significantly large number of errors occurring in NPs of the '4d' variety. In contrast, there were no errors at all recorded for '5d' types.
The data for the four NP categories in which errors occurred were then plotted according to proficiency level as follows :
Figure 25 : Accuracy rates of NPs compared to proficiency (Series 1=1d, 2=4d, 3=6ø, 4=6i)
The diagram reveals largely erratic curves for each of the NP types in which no significant trends can be discerned.
The frequency of errors was then analysed according to the gender of the subject for each NP type :
Figure 26 : Distribution of errors between sexes (%) in the active test
Figure 26 shows a high degree of uniformity between the sexes, with the sole exception of generic '1d' types, in which female performance was considerably better than that of the males.
All NPs containing article errors were then analysed further within their categories according to the specific kind of error that the subjects had made. The following table summarises all the article error types which were found in the data :
Figure 27 : Article error type
Using these criteria, the frequency of article error type was plotted within the NP types in which errors had occurred :
Figure 28 : Frequency of article error types according to NPs (%)
From the above chart and preceding criteria we can see that for '4d' type NPs, the major causes of error were the omission of the definite article (category 'b'), and the substitution of the definite article for the indefinite (categories 'a' and 'd'). '1d' NP error involved solely the omission of the definite article, whereas the major errors in '6i' and '6ø' NPs involved the misuse of the definite article in place of the indefinite and zero articles respectively.
Specific errors were not broken down according to proficiency level, since the samples were deemed to be too small to reveal any meaningful trends.
Finally, the errors in the active test were assessed according to their distribution between the sentences composed for the first picture and those composed for the second picture in order to observe errors with regard to the notion of first and second mention of objects:
Figure 29 : Distribution of errors between pictures in the active test
Here we can see that there were proportionally more errors associated with the second picture, and that these were confined to '4d' errors. The other three NP types in which errors occurred were all associated with the first picture.
Mean accuracy rates for the two tests differed considerably, with 57.5% for the PT significantly lower than 83.8% for the AT. This mirrors the results of Tarone (1985:381), although her study differed in that an error-recognition test was coupled with an oral rather than a written productive test. Nevertheless, her results are strikingly similar, with 56% and 83% being obtained respectively.
There seems to be two possible explanations for the surprisingly low mean scores of the PT, which on the face of it, one would have expected better results from since it involves merely the recognition of what is a correct usage, whilst the AT involves the more complex task of having to select and deploy the correct articles as part of a whole range of grammatical decisions.
Tarone and Parrish, as we noted earlier (see page 17), believe that at least as far as oral production goes, speakers tend to pay greater attention to articles on narrative tasks because of the important part they play : in other words, the referential functions of articles are vital if the speaker is to communicate the story in a coherent manner such that the listener can follow and make sense of it (Tarone and Parrish 1988:34).
While this is plausible enough, it only accounts for the comparatively high scores on the active test (since it is most likely true for written as well as oral narrative) : we must still consider the question of why the subjects faired so badly on the seemingly easier passive test. The answer may lie in the design of the test ; as the students were not told that they were specifically looking for article errors, they may have assumed that the grammatical errors to be found were of a different nature, given what we have already said (page 6) concerning the tendency of Japanese learners not to attach enough importance to articles due to interference from their own language.
With reference to the distribution of the results for the two tasks, it can be readily appreciated by looking at Figures 10 and 12 (pages 31 and 32) that whilst the distribution of scores for the PT were normal, for the AT an abnormally high number of subjects (18 out of 43) achieved maximum marks. This can be explained in two ways :
Turning to the results of the inferential statistics (see page 34), it is immediately apparent that both predictions (1) and (2) (page 8) have been confirmed by the data. That is, the results of the passive test indicate a direct relationship between proficiency level and article accuracy (comparable with the findings of Mizuno, 1985:17), whilst those for the active test display no such correlation.
This leads to the conclusion that although the subjects were obviously aware of the article system, since average scores in both tests were considerably higher than would have resulted from a random deployment, some factor was responsible for changing article error from being proficiency-related in the passive test, to being endemic and unrelated to proficiency in the active test.
The likeliest explanation derives from the finding (discussed in detail from page 44) that subjects had the greatest difficulty in dealing with the marking of first and second mentions of NPs. This explains the different results for the two tests, since the passive test, consisting of isolated and unrelated sentences, did not cover this important area of discourse, and thus exhibited a positive correlation between proficiency and accuracy. The active test, being based upon the output of discourse, necessarily required from subjects an ability to utilise articles for marking 'given' and 'new' information if it was to be completed successfully, a deliberate feature of the design.
The implications of this are that the function of articles as markers of first and second mentions is an area of usage in which the subjects were not competent, or had never learned, since it proved to be problematic at all levels of proficiency, being the root cause of the lack of correlation between proficiency and accuracy in the active test.
It seems likely that interference from Japanese also played a part in subjects' poor performances in this area, given the observed tendency of Japanese learners to omit articles in all forms of discourse.
The erroneous sentences in the PT contained only five NP types, 5d, 6i, 1ø, 1d and 9i. Scores were fairly evenly distributed among them (see Figure 21, page 36), with subjects having the fewest problems with the omission of the definite article in the field [+SR],[+HK] :
"Manchester is a city in_north of England."
The next least problematic were the erroneous use of the definite article in place of zero in the field [-SR],[+HK], and the omission of the obligatory indefinite article for [-SR],[-HK] :
"The swimming is healthy, but boxing is not."
"It's not_good idea to drink so much wine."
Most difficulty in spotting errors occurred in the three sentences in which the omitted article could have been replaced by either the indefinite or the definite article, depending on the contextual interpretation :
"I went to_British Embassy to get a new passport."
"She lived in_large house near the river."
"Every Friday I go to_supermarket to buy groceries."
More than 45% of all errors occurred in these three sentences, but their ambiguity means that they are difficult to analyse.
Concentrating on the first three, unambiguous, sentences we find that the order of accuracy in error-recognition is as follows : 'the' > 'ø' > 'a'. This is identical to the results of Yamada and Matsuura's advanced learners in a cloze test (1982:61), and the fact that the definite article in the field [+SR],[+HK] was the least problematic is supported by the findings of Tarone and Parrish (1988:34) and Kubota (1994:23).
Of the twelve types of NP that were generated in the active test, only four were produced with examples of erroneous article use. With the exception of 5d types, which constituted 22% of all NPs, and were error-free, all other NP types were produced in very small numbers (see Figures 23 and 24, page 37).
The degree of accuracy in each of the five most frequently produced NP types is as follows :
5d [+SR],[+HK] > 6ø [+SK],[-HK] > 1d [-SR],[+HK] > 6i [+SR],[-HK] > 4d [+SR],[+HK].
The results are at first glance highly surprising in that the same article in the same semantic field can be found at both ends of the scale. However, consulting our list of criteria (page 13), we see the difference between 5d and 4d NP types is both crucial and telling.
5d NPs are defined as 'specific referents assumed known to hearer', whilst 4d NPs are 'referents previously mentioned in the discourse.' The former are thus contextual assumptions concerning background knowledge, whilst the latter are discourse markers of second mentions. This functional difference is made more explicit in Figure 29 (page 40), where it is shown that all 4d errors occurred in the sentences associated with Picture 2 (although a number of correct 4d NPs did occur for Picture 1). 5d NPs, all correct, were distributed between the two pictures, with the majority being associated with Picture 1.
The implication is that learners have great difficulties with regard to the marking of 'given' and 'new' information, particularly where second mentions are concerned. More specifically, the fact that the small number of 4d types occurring for Picture 2 were correct also indicates that within a single sentence the concept of first and second mention can be successfully used, but when remoter in the discourse (in the context of the second picture), the connection between first and second mention is somehow broken. Given the clarity of the instructions, and evident from the collected data, this was not a case of subjects mistakenly regarding the two pictures as unrelated.
Turning to the specific nature of the errors made by subjects (Figures 27 and 28, page 39), it is apparent that the most frequent cause of error in 4d NPs was the omission of the definite article (45%), followed by the erroneous use of the indefinite (27%) ; the remaining significant error types (22%) involved the incorrect use of the indefinite article in phrases intended to convey the meaning "one of the...". For example, one subject's second sentence for Picture 1 was :
"Two men are walking in a park."
However, the first sentence for Picture 2 was :
"A man is going to throgh [sic] a radio."
Having already identified the two male protagonists in the first instance, the second sentence is erroneous not simply because it is diverging from the English system of first and second mentions, but also because it is ambiguous in that it fails to identify which of the men is acting. If, however, the writer had intended 'a' to mean 'one', then the natural expression of this notion by a native speaker would be "One of the men is going to throw..."
Given the prevalence of this type of error, the conclusion must be that learners are not familiar with such constructions and are confusing the different functions and uses of 'a' and 'one'.
The extraordinary success the subjects had with 5d types is indicative of the fact that they understand and are able to correctly employ the definite article in a deictic role, or where the referents are deemed to be already known to both reader and writer.
The next most problematic NP type was 6i, that is, use of the indefinite article to indicate first mention of a specific NP. This further strengthens the impression that 'given' and 'new' is the most problematic aspect of article usage for Japanese learners. Figure 28 (page 39) shows that two specific types of error occurred in 6i NPs. The most prevalent (62.5%) involved the definite article being used in place of the required indefinite, while the familiar tactic of article omission accounted for the remainder.
1d NPs, accounting for 13% of all errors, consisted uniformly of the same mistake, namely, the omission of the definite article for a generic expression. That learners have difficulties with generics is not surprising, given the fact that they can be marked by all three of the articles in English ; that errors consisted solely of article omission suggests interference from Japanese.
Our final class of NPs in which errors occurred, 6ø, accounted for 8% of total errors. Two specific types of error can be identified from Figure 28 (page 39): the required zero article being replaced by the definite (67%), and by the indefinite (33%). As only three instances of error were recorded in this category, it is difficult to draw any conclusions, except to note that the fact that English can express NPs in the field [+SR],[-HK] by using both the indefinite and zero articles, depending on the context, may be confusing for learners.
Overall, the commonest error among all NP types was omission, being the sole error type for 1d NPs, the most frequent for 4d NPs, and accounting for more than a third of 6i NPs. That omission should emerge as a popular strategy is unsurprising, and clear evidence of interference from Japanese : in this respect it appears to function as a kind of 'default setting' reverted to in instances of doubt or confusion as to the correct choice of article.
Figure 25 (page 38), plotting accuracy for each NP type against proficiency level reveals erratic patterns from which no real trends can be discerned. This applies equally to each of the four NP types in which errors occurred, although for 4d types something of an acquisition curve can be observed in the regular increase in accuracy from proficiency level 4 upwards. However, this is preceded by higher levels of accuracy for beginners, which detracts from its validity. The inescapable conclusion is that, as was the case for total AT scores, there is no correlation between proficiency level and accuracy within any of the individual NP types. It must be borne in mind, however, that the small samples involved for some NP types make any assumptions far from concrete.
That our expectations concerning gender differences were confirmed can be seen in Figures 15 and 16 (page 33). In line with our prediction, the more comprehensive active test did show a considerably better performance on average by female subjects, whose 87% contrasted with the males' 75%, a result even more surprising when it is realised that the females were on average at a lower level of proficiency.
In the passive test males performed marginally better than females (5.85 compared with 5.71), but fractionally lower than would have been expected, given their advantage in proficiency.
Turning to specific error types, Figures 22 and 26 (pages 36 and 38) reveal some interesting results for both tests. The passive test shows that while accuracy in two of the categories was roughly equal in gender distribution, the others display considerable variance : for sentences 1 (5d) and 8 (6i, 4d, 1d) males out-performed females by 20% and 15% respectively, whilst for sentence 7 (1ø) females out-performed males by 20%. This last result ties in with the only gender difference found in the active test data, which were otherwise remarkably uniform : females again bettered males, but in this instance by a massive 60%.
We have already noted how task type affects article accuracy : could it also affect the way male and female subjects perform? Personal classroom observation in Japan indicates that male subjects tend to be noticeably more reticent and less proficient at cohesive discourse in oral tasks : one can only speculate, therefore, about how this might spill over and affect the medium of written discourse found in the active test. If the overall weaker performance of males can be explained by this, it is still difficult to discern why male performances in the generic category 1d were the only NP types in which any kind of gender difference occurred, since generics do not play any great role in the cohesion of discourse.
The wider range of differences found in the passive test are also difficult to satisfactorily account for.
One further possibility, beyond the scope of the present study, is that L1 interference might explain the differences, since Japanese is a language in which there exists a considerable divide between the language used by each sex, in terms of lexicon as well as such non-gramatical aspects as intonation.
On the other hand, the differences encountered in the data may be unreliable due to the limited nature of the sample, in which case a larger number of subjects need to be tested in order to see if the findings could be replicated.
Since our results have shown that article errors are persistent and endemic among Japanese learners of English, and that in productive tasks these errors seem to affect learners regardless of proficiency level, then clearly the present state of affairs with regard to English language teaching is deficient.
This deficiency stems from two main sources :
Given the results of this and other related studies, it is apparent that the rote-learning of grammatical rules should be replaced by some sort of graded experiential method in which the articles can be acquired with regard to their semantic function. That is to say, learners need to approach the articles in such a way that they can appreciate their use in context.
The question of how to present the articles in the classroom has been touched upon by a number of writers. Mizuno (1985:24) asserts that the articles should be presented in sequence according to their relative degree of difficulty, with the easiest being presented first. Thus, on the basis of his experimental studies, the definite article would be tackled first, followed by the indefinite, and finally the zero. However, Mizuno goes into no further detail, and fails to address the consideration that the same article may have several differing functions, each presenting different problems for the learner.
Whitman (1974) believes that the definite and indefinite articles should be presented in terms of their basic functions as determiners and quantifiers respectively (1974:254). He recommends the introduction of the indefinite first, beginning with singular and plural count nouns, then moving on to contrast these with plural and generic non-count nouns. The definite article is then introduced, first in its deictic demonstrative role, then as a marker of second mentions (1974:259).
Master (1990:466) also suggests a functional approach in his binary system in which NPs are reduced into items which are either classified or identified. He then proposes to introduce learners to the following diagram in order to help them choose the correct article for any given NP :

Figure 30 : Master's binary system of article classification (1990:470)
Quite how this information could realistically be given and explained to low level learners is not elaborated on, and consequently the idea, though of theoretical interest, does not seem to be a fitting tool for actual classroom use.
Master goes on to propose teaching the idea of first and second mentions as a first stage, followed by the concepts of shared knowledge and referents, in an order contrary to that put forward by Whitman (1990:471).
As a result of the data obtained in the present investigation, the following are proposed regarding the language classroom :
Non-specific reference
The above recommendations are necessarily lacking in specific classroom practices, since this falls beyond the scope of this report. In any case, given such a framework, skilled and resourceful teachers will have little difficulty in devising or adapting particular activities for their presentations.
The following conclusions have been drawn from the present study with regard to the abilities of Japanese adult learners of English in article usage :
The full bibliography (29 items) is available upon request.
Overall data
Full error analysis data for AT results