Evaluation Checklists and Frameworks (Pt2)

In this post I want to look at what goes into the construction of an evaluation framework. This means determining the format of a checklist, which is likely to produce a tension between breadth and depth, and informativity and economy. Moving from a pre-use impressionistic evaluation toward a comprehensive and more informative one needs careful deliberation over the micro- and macro-characteristics that make up a criterion-referenced checklist. What is clearly vital at the start of the evaluation process is both the consideration of the learners’ needs and the context of use.

The micro-considerations for this area of evaluation are the characteristics of the learners (age, level, learning style, socio-cultural background etc.), the learners’ needs (language, skills, functions, language systems etc.), the teachers who will use the materials (experience, confidence, methodological competence etc.) the programme and the institution (level within the educational system, timetable, physical environment public or private sector). The macro-features will be focused on the external context: aims of education (examination systems curriculum content, language policy, role of target language within that country) the aims of language education (national syllabus, cultural and religious considerations) (McGrath 2002).

As mentioned in my previous post, there is still a lot of debate about the use of analysis questions in evaluations, and there are differing techniques used by authors in their approach and the steps (micro) and stages (macro) involved in an evaluation. It is clear that there is a benefit in performing a pre-evaluation analysis. It is implied that this should separate the ‘wheat from the chaff’ by highlighting key objective aspects that materials must contain, if they don’t they can be rejected. Accurate assumptions can be made about the layout, images, and the types of skills being assessed from an informal scan of materials. An analysis can be characterised as impressionistic, a checklist, or in-depth, the idea being that these analytical descriptions are to be compared to the identified needs before moving on to an evaluation. The materials analyses that I encounter are of the impressionistic nature and rely on flick tests. My assumption is that this is due to the economy of time and the lack of training given in this field. It does not comfort me that important decisions about materials are made in this way. This module has gone a long way to make a difference in my approach to and understanding of these decisions.

What does it take to create a checklist for an evaluation of ELT materials that is holistic and economical?

McGrath (2002) offers three approaches to checklist design, firstly you can borrow and adapt checklists that are available to you for your own context. Secondly, you can brainstorm and draft your own original fit-to-purpose checklist. The final option would be to research the people that the materials immediately affect (teachers and learners) and find out what is important to them. From my experience the evaluations I perform are somewhere inbetween the first and the last point made by Mcgrath. The criteria I use (which is not a hard copy checklist, but an internal one) is based on my own personal context of my students, my principles and the observation of the characteristics of the class e.g. level, age, cutlure.

McGrath (2002) suggests the following steps for designing a checklist:

  1. Decide general categories within which specific checklists will be organised
  2. Decide specific criteria within each category
  3. Decide ordering of general categories and specific criteria
  4. Decide format of prompts and responses.

As a means of comparison, Tomlinson (2013) suggests brainstorming a list of universal criteria that is applicable to any language learning materials, then derive principles of language teaching and learning from classroom observation. This should provide a fundamental basis for materials evaluation. Following this the criteria needs to be sub-divided to help pinpoint specific aspects to be revised or adapted:

For example, if looking at instructions, are they:

  • succinct
  • sufficient
  • self-standing
  • standardised
  • separated
  • sequenced / staged

The universal criterion needs to be revised and monitored to maintain consistency and validity. Questions should reflect evaluators’ principles, but not impose a rigid methodology as a requirement of the materials. This may lead to some materials being dismissed due to pedagogical bias or assumptions. Are questions reliable so that other evaluators would interpret it in the same way? Are the terms and concepts applicable to differing interpretations of applied linguistics? If they are not, then it suggested that they are avoided or glossed (Tomlinson, 2013). I can relate to this point, there have been occasions, especially early on in my career, when my knowledge of meta-language and learning theories felt insufficient to contribute to a discussion about the pedagogical merits of materials. My lack of experience meant that I did not have the depth of knowledge to interpret tasks from different contexts and needs.

Williams (1983) (as cited in McGrath, 2002) makes a salient point that “a checklist is not a static phenomenon”. Every context is different, and therefore the strength that a list of criteria has is only relevant to the situation in which it is to be used. ‘Off-the-shelf’ checklists are likely to need adjustments to suit different contexts. The categories and checklists are supposed to be instruments of objective analysis, evaluation and observation, but they are as much a reflection of the time at which they were conceived and of the beliefs of the author (in the same way as the materials that they are being used to evaluate). Learning theories have evolved and will continued to do so and the evaluation must observe this fact too. You would not review an early mobile phone using the same criteria that you would for a modern smart phone it just isn’t the same device any more.

The evaluation checklist has to be relevant to the ELT context in which it is to be used. The framework for the checklist should be criterion-referenced based upon the principles that that evaluator(s) believes are most apt. The issue of what should be included in a checklist and what is superfluous to requirement is when evaluation can become very muddied and complicated. Tomlinson (2013) makes a valid distinction between general criteria i.e. essential features of any good teaching-learning material and specific criteria or context-related. In other words, general criteria is essential, and specific criteria can only be determined on a basis of individual circumstances. Moving from general to specific criteria McGrath (2002) believes will lead to the identification of a set of core criteria to be used or applied irrespective of evaluation method in any situation. While he also suggests that once the general criteria has been tentatively decided, the next step of populating the checklist with specific criteria that is comprehensive and relevant is potential ‘messy’. McGrath advises that reference to published checklists may assist in avoiding an over-zealous and context heavy criteria.

I feel there are some excellent points raised here because it is important for my principles to run alongside what is already in the ELT domain of what is good practice and essential to language learning. The general criteria must present a holistic picture of ELT. The general criteria may very well remain, but the specific will shift and adapt to the context and research developments. A general framework will inevitably lead to a more holistic and economical method of evaluation. For a consistent and balanced framework, the general criteria must consider current learning theories based on the findings of research that are most convincing and applicable in ELT.

(Tomlinson, 2013) suggests the following generally agreed upon criteria:

  • Deep processing: processing is semantically focused on meaning of the intake and relevance to learner.
  • Affective engagement
  • Mental connections: between new and familiar.
  • Experiential learning: apprehension before comprehension
  • Learner’s need to want to learn
  • Multidimensional processing – sensory imagining, affective association, use of an inner voice, learning experiences with emotions, attitudes, opinions and ideas.
  • Informal personal voice more likely to facilitate learning than those, which are formal and distant.
  • Informal discourse
  • Active rather than passive
  • Concreteness e.g. examples and anecdotes
  • Inclusiveness
  • Sharing experiences and opinions
  • Occasional casual redundancies rather than always being concise

In addition to the theories of ELT there is also the Second Language Acquisition (SLA) theories to consider. Adding to an already tricky task is the inconclusiveness and controversial variants in this field. This re-enforces the idea of avoiding rigidity, I must be careful here not the hold too tightly to my principles and beliefs, and allow other recognised pedagogical factors to influence my criteria. Tomlinson (2013) suggests some of the agreed upon ideas are:

Materials should:

  • Achieve impact.
  • Help learners feel at ease
  • What is being taught should be perceived as relevant and useful by learners
  • Facilitate learner investment
  • Learners are ready to acquire what’s being taught linguistic and developmental readiness and psychological readiness
  • Expose learners to authentic language use semi planned and unplanned discourse requiring a mental response
  • Learner’s attention should be drawn to linguistic features of the input
  • Provide opportunities to use target language to achieve communicative purpose
  • Take into account the positive effects of instruction are usually delayed Learners different learning styles
  • Take into account differing affective attitudes
  • Maximise learning potential encouraging intellectual aesthetic and emotional involvement stimulating both right and left brain activities
  • Opportunities for outcome feedback

The list of criteria could be infinite unless the evaluation is principled and the evaluator’s principles are overt and referenced to procedures (Tomlinson, 2013). The danger being, if an evaluation is ad-hoc it could lead to misleading results. This has definitely been the case in the past when I have been pre-evaluating course books.

McGrath (2002) says that individual criterion is a matter of judgement based on their circumstances. Again, it comes down to context and the specific criteria being used. If I were to evaluate materials for use outside of my classroom, would those that do not meet a specific piece of criterion be rejected or would it be suitable for something else? The rating, weighting and scoring format has paramount importance because the responses to the criteria are what determin decisions about those materials. In addition, the interpretations of data need rationalised assessment because high scores in one section of criteria does not automatically indicate suitable materials. What should be studied are: a widespread of desired features, and the concentration of scores in those areas. Care needs to be taken so answers don’t appear to be no-committal. Questionnaires that have several options could result in lots of ‘safe’ decsions being made. This opens up the consideration for a debate about open-ended questions and as they require more investment and are likely to offer more thoughtful responses. McGrath (2002) simply states that the acid test for clarity of criteria is to try it out.

The ordering and the amount of questions in each section or category is purely determined on its merits alone and there should be no strict regulations enforced on equality for each category or item because in fact not each part of the evaluation is as important as another. This is when the articulated principles should play a part on the different items present in the evaluation.

In order to make my checklist as reliable and valid as I possible the criterion-referenced checklist needs to evaluated. Tomlinson and Masuhara, (2010) advise the use of five clear questions to be asked of your evaluation criteria:

  1. Is each question and evaluation question?
  2. Does each question only ask one question?
  3. Is each question answerable?
  4. Is each question free of dogma?
  5. Is each question reliable in the sense that other evaluators would interpret it in the same way?

Tomlinson’s (2012) State-of-art article states that it is rare that checklists can satisfy all of these questions, which goes some way to further underlining that most evaluation checklists are not generalizable or transferable. Therefore, using other peoples published ones is not the job done. There is still plenty of work to be done.

This week’s section of the course has been really tough. Some of my peers did fantastic jobs on their evaluations. They cross-referenced their findings and made valid and reliable judgements for their teaching and learning contexts. That is the thing, it is so much of them in there that makes the evaluation what it is. The beliefs and principles that they set out from the start were completely different to mine from the previous week. There is no way of knowing if we had worked together whose principles would have taken president. The thing that scares me is the evaluating of the evaluating. The evaluation checking questions needed to check the actual evaluation questions. Does it tumble out of control and there is need to have the evaluative evaluation checking questions? If validity and reliability is to be trusted there needs to be a limit on bias as much as possible? I am not sure what my evaluation criteria will look like yet.Once I have created some materials I will evaluate them accordingly.

My general criteria (based on my articulated principles) would be:

  • They are contextually relevant to the learners needs.
  • Materials should provide opportunities for communication to take place.
  • Target language has a real world use and function beyond the classroom
  • Variety of approaches and types of task (this is more for a coursebook, I suspect)
  • Engagement – images, video, topics.

 

 

 

References

Mcgrath, I. (2002) Materials evaluation and design for language teaching, Edinburgh, Edinburgh University Press.

Tomlinson, B. (2012) Materials Development for Language Learning and Teaching. Language Teaching, 45 (2), 143-179.

Tomlinson, B., Editor of Compilation. (2013) Developing materials for language teaching, London, Bloomsbury Academic.

Tomlinson, B. & Masuhara, H. (2010) Research for materials development in language learning: evidence for best practice, London, Continuum.

Materials Evaluation (Pt1)

This is a big one, an important one, and sadly, it is this area of materials design that makes my head hurt slightly. There is so much that goes into an honest and reliable evaluation. It is one of the key aspects of materials design that I identified in the first week of the course.

Over the years in my many teaching roles I have been involved in many informal coursebook evaluations. These were generally part of my institution’s staff meetings and training segments. The process was one of proposed future coursebooks being given out in the meeting and then being asked to give our opinions and suggestions about which book we would prefer to use in our classes. Despite the clear lack of training, and for some of us experience, the unsystematic approach of few minutes’ flick-through was all that was provided for what should be a very important decision. The chosen coursebook would go on to form part of the syllabus for those classes and would hence dictate the topics, language and delivery in those classes. Teachers were forced to use them and the students were prompted to buy them.

Thinking and reading about evaluation has led me to the realisation that I evaluate materials everyday. The flexible manner in which my skills’ classes are structured means that I can use a variety of resources, either published or teacher authored. This requires daily evaluation of what learners may need and cross-referencing them with the suitability of the content, pedagogy and methods of those materials. However, I don’t feel that previously I have based my decisions on anything more than instinct or experience. That is not to say that those things aren’t important and valid, but I am keen to move forward. I need to recognise my articulated principles (as discussed in my previous posts) and apply them to my evaluation criteria as a means of having a systematic approach. If I am honest with myself, I don’t always consider the whole paradigm of methods e.g. I may opt for a more effective approach over a cognitive, or function over form. With a more rigorous examination of other evaluators’ principles, criteria and checklists I hope to dig deeper into my understanding of developing a well rounded approach to evaluation.

Tomlinson (2013) states that no two evaluations will ever be the same as the needs, objectives, backgrounds, and preferred styles differ from context to context. No matter how structured, criterion-referenced and rigorous an evaluation is; it will essentially be subjective. A starting point has to be about asking: who are the materials being evaluated for? The main point being that globally published materials cannot be evaluated in such way, but their effect is significant for those people who come into contact with them (learners, teachers, the syllabus, the evaluation itself). This echoes my sentiments about the coursebook evaluations I have experienced.

An evaluation is not an analysis because the objectives and procedures are different (Tomlinson, 2013). Analysis tries to offer objectivity by asking what the materials contain, and what they aim to achieve. This can be answered factually with “yes” or “no” responses or as a verifiable description. Here it should be acknowledged that any questionnaire written by evaluators might still be influenced by their own ideology and experience, and accordingly seen as biased. On the other hand, evaluation questions are in some ways about making judgements; answering on a sliding or Likert scale (for scores to be totalled) to measure the influence of something such as: “are the listening texts likely to engage the learner?”, choosing a grade between 1-5 (1 = very, 5 = not at all). This is something that I struggle with myself when presented with scales. The criteria can be very specific, but still an element of subjectivity can creep into your answers. Over a larger-scale questionnaire this may have significant impact on the outcome of the evaluation.

The unique situation of learning and teaching means that evaluators (professional or amateur) will adhere to their own conscious or subconscious principles. This in turn will drive the criteria used to glean the appropriate information to make evaluative judgments. The more experienced a teacher, the more likely they will be bringing the bias of their experience to an evaluation, thus potentially rendering it less valid. Tomlinson (2013) advocates that this possible bias should be articulated from the start in order to give the evaluation greater validity, reliability and less scope to be misleading.

McGrath (2002) uses Cunningworth’s point of view that course materials are not intrinsically good or bad, rather they are more or less effective in helping learners to reach particular goals in specific contexts. This is a very sobering thought that I must comprehend before I am drunk on evaluating power. The materials that others have designed are suitable for other contexts, maybe just not my classroom and therefore my judgement of them needs to be appropriate. It is important for me to note that when I evaluate any materials, my teaching requirements (academic skills and IELTS examinations) may not match the materials I am presented with and vice versa. This does not mean that those materials should be judged on that context alone; I don’t evaluate all the clothes in a department store when all I want to buy is gloves. The merits and demerits of socks are just not worth comparing for that context.

Evaluations differ in purpose, in personnel, in formality and in timing e.g. helping a publisher, someone doing their own research, developing their own materials, or writing a review for a journal. There are three stages when an evaluation can, or should take place. The most common one it seems in the institution that I have worked in is the pre-use evaluation and from talking to other colleagues that is their experience as well.

There are two dimensions to a systematic approach to materials evaluation that I must define before using them. The macro-dimension consists of a series of stages (the approach in the broad sense); the micro-dimension is what occurs within each stage (the steps or techniques employed). The pre-use, in-use and post-use evaluations are macro-dimensions, but the criteria within each of those is the micro-dimensions (McGrath 2002).

A pre-use evaluation makes predictions about potential value of materials that can be:

  • Context-free – reviewing for journal
  • Context influenced – review draft materials for a publisher with target users
  • Context dependent – selecting for use in a particular class (Tomlinson 2013)

They are often conducted on an impressionistic level and part of the quick-flick culture that has been the situation I have encountered most. Tomlinson (2013) describes it as “fundamentally a subjective rule of thumb activity.” However, McGrath (2002) mentions that checklists and more in-depth criterion-referenced evaluations can be used. This can hopefully reduce subjectivity and offer a more principled, rigorous, systematic and reliable outcome for judgements to be made.

In fact, McGrath (2002) supports the procedure that involves conducting a materials analysis first which is then followed by a first glance evaluation, user feedback and evaluation using context-specific checklists. This is not the only method that is used, others include:

  • Riazi (2003) surveys the teaching and learner situation conducting a neutral analysis and then carrying out a belief-driven evaluation.
  • Rudby (2003) uses a dynamic model of evaluation with categories for psychological validity, pedagogical validity and the process and content validity.
  • Mukundan (2006) uses a composite framework combining checklists, reflective journals and computer software to evaluate ELT TEXTBOOKS in Malaysia
  • McDonough (2013) develops criteria evaluating the suitability of materials in relation to usability, generalizability, adaptability and flexibility.

(All cited in McGrath, 2013)

The second type of evaluation is an in-use one; this offers a more objective and reliable perspective of the materials. It has the potential to make use of measurements rather than just relying on predictions because it will be able to reflect on the materials being used and the immediate reactions and effects. What can be measured in an in-use evaluation is:

  • Clarity of instruction
  • Clarity of lay-out
  • Comprehensibility of texts
  • Credibility of tasks
  • Achievability of tasks
  • Achievement of performance objectives
  • Potential for localisation
  • Practicality, teach ability, flexibility and appeal of materials
  • Motivating power of materials
  • Impact of materials
  • Effectiveness in facilitating short-term learning.

This does not mean that this type of evaluation is not without limitations. It can make judgments about the criteria that are observable and judgments about the material’s effects on short-term memory. However, it cannot claim to measure effective learning nor what is happening in the learner’s brain due to the delayed effect of instruction.

Post-evaluation is probably the most valuable (yet least administered) way to make judgements on the potential affordances and pertinence of materials for your classroom. This is because it is not economical or pragmatic for most institutions. It would take time and expertise to complete a post-evaluation successfully. If administered effectively it has the potential to note short-term effects with regards to motivation, impact and achievability. It could also examine and feedback on the long-term effects of durable learning and application. It can answer important questions as:

  • What learners know that they did not know before using the materials?
  • What do learners still do not know despite using the materials?
  • What can learners do that they could not do before?
  • What can’t learners still do despite using the materials?
  • To what extent have materials prepared learners for examinations?

The benefit for a post-evaluation is that it could measure actual learning outcomes through various ways of measuring: test what the materials have taught, test what the learners can do, interviews, questionnaires, criterion-referenced evaluations by the user etc. That type of data would provide reliable and robust feedback for decisions about the use, adaption or replacement of the materials to be made. There is still a need for caution because learning is not an exact science and variables like the following may affect outcomes in numerous ways: teacher effectiveness, parental support, language exposure outside the classroom, intrinsic motivation.

What is apparent is that for any evaluation to be systematic it is preferable for a criterion-referenced checklist to be employed, which will aid the gathering of data. Much like in the design of materials, for an evaluation to have any coherence and validity it needs a framework of principles of ELT to be compared against.

Based on my principles and what I have read so far, I would want to employ all three stages of evaluation of any material. This is obviously a long and time-consuming process. A pre-use evaluation is what my colleagues and I encounter most often. In line with my beliefs I would look for the following criteria in a pre-use evaluation:

  • Does the input appear to engage the learners?
  • Is it appropriate for their current level?
  • Is the input accessible and engaging?
  • Will the input and the tasks motivate/prompt students to interact with target language?
  • Do learners have opportunities to use target language in communicative tasks?
  • Are those communicative tasks useful for the language outside of the classroom?

In my next post I will look at the criteria and frameworks that can be used to evaluate materials. I will also discuss what some of my colleagues and I discussed in our seminar about materials evaluation.

References

Mcgrath, I. (2002) Materials evaluation and design for language teaching, Edinburgh, Edinburgh University Press.

Mcgrath, I. (2013) Teaching materials and the roles of EFL/ESL teachers: theory versus practice, London, Continuum.Tomlinson, B. (2012) Materials Development for Language Learning and Teaching. Language Teaching, 45 (2), 143-179.

Tomlinson, B., Editor of Compilation. (2013) Developing materials for language teaching, London, Bloomsbury Academic.