Friday, May 22, 2009

Kang (1995) The Effects of a Context-Embedded Approach to Second Language Vocabulary Learning

This is another paper I am reading as part of a meta-analysis of different approaches to teaching vocabulary. This was a very interesting study that showed big improvements not only in vocabulary retention on straightforward recall tests, but also on listening comprehension and knowledge transfer. There were four experimental conditions, including teacher led class study (P&P), computer study lists (CW), computer study lists with pictures (CP), and the clear leader; computer study based on learning vocabulary in the context of narrative (CC). I would really have liked to see some example images from the interface. The results seem to show that learning simple vocabulary in the context of narratives is highly effective.

I think it is fascinating that students in the context condition (CC in the above diagram) outperformed on pure recall tests. I had been starting to form the impression that preparing for a particular form of test is the best way to improve performance on that test, but in this case it seems like preparing for knowledge transfer also led to better scores on straight recall tests; although I'd need to see interface images to check if this really does contradict the kind of cross-match up we see in Groot (2000) where concordancing practice boosts performance on concordance tests, but not on straight paired associate recall, and vice versa.

Kang cites lots of relevant theory such as "inert knowledge" (Brown et al., 1989) and "cognitive embedding" (Ausubel, 1968), which links up with points I was making in a recent journal paper I co-authored with Maria Uther of Brunel University in the UK: Joseph & Uther (2009) In that paper I referred to some research (Chi & Koeske, 1983) indicating that information that is more deeply embedded in an individual's knowledge network is likely to be remembered longer; although this is a common theme in the literature on memory and second language acquisition, the references that Kang cites are different from those that I have been aware of so far. Of course that is not so surprising, but it gives me further pointers to link up this concept as far as computer assisted language learning goes.

One query I have is about the reliability coefficient that Kang reports for each type of test. My, possibly flawed, understanding of reliability is that one is attempting to work out how effectively some measure is at assessing an individual construct; such as in a social science questionnaire where multiple questions attempt to probe the same underlying construct like racism or sexism. However in an experiment like Kang's each test is attempting to measure the learner's knowledge of a particular word. There are no repeat measurements using different instruments, except to the extent that Kang employs three types of vocabulary tests. One could assess reliability across those tests, but Kang reports reliability for each individually.

The only way I can make sense of a reliability coefficient for a single type of test on multiple words, over multiple learners is that we are thinking of knowledge of multiple words as a single construct and are assessing reliability of the test instrument in those terms. However since the leaner may have had individual difficulties with each word separately, i.e. they are likely to learn at different rates for different words, that doesn't quite make sense. Unless all the words are very similar in terms of abstractness, visualizability, frequency etc., i.e. we have determined that the each test is probing the users learning of the same sort of word, e.g. the reliability of a test type such as productive recall for assessing learning of concrete nouns. Kang describes the vocabulary used in the study as common everyday words such as household items and routine activities. Anyhow, I don't think this is a serious criticism of a very interesting study, and it's highly likely that I am misunderstanding the meaning of reliability measures in this context, but it does seem a little like a situation where the statistical software generates reliability measures and they are reported verbatim without assessment of their suitability for the experiment in question.

Chi, M.T.H. & Koeske, R.D. (1983) Network representation of a child's dinosaur knowledge. Developmental Psychology 19(1) 29-39.

Joseph, S.R.H & Uther M. (2009) Mobile Devices for Language Learning: Multimedia Approaches. In Research and Practice in Technology Enhanced Learning 4(1) 1-26.

1 comment:

Yi-Jiun (or Angela) said...

Brown, JD. (2005). Testing In Language Programs: A Comprehensive Guide To English Language Assessment.

Might be interesting to read C.8 (on language test reliability) and C.10 (on psychological construct)

On p. 196, the author sums up that if all other factors are held constant, the following statements are usually true:
3. “A test made up of items that assess similar language material tends to be more reliable that a test that assesses a wide variety of material.”
4. “A test with items that discriminates well tends to be more reliable that a test that do not discriminate well.”

According to the book, I guess these might be the reasons why Kang reported the Cronbach alpha coefficient. It’s at least a good way to show the readers how well her in-house assessment worked. In addition, it seems to me that a construct is more abstract than individual words. It might refer to vocabulary development in this case.