MY PhD THESIS: RADICALIZATION OF SCIENCE EDUCATION: Chapter IIIc: Evaluating Classroom Teaching

Chapter IIIc

Evaluating Classroom Teaching

A statement like ‘good teaching is at beholder’s eyes’ draws our attention that it is essential to develop an understanding of how the classroom teaching should be evaluated. For this it needs to explain ideas that form the bases for different modes of evaluation used in classroom teaching. Value judgements as we know are different from person to person, because individual standards of evaluation are affected by the psychological factors a person is exposed to or had experiences in the past. But any judgement is based on the information a person can get or has access to. On the basis of information that one has received, one makes judgement in one’s own way. The standard of evaluation can be defined in numerical measurement if information can be quantified. The quantification helps to express the standard of reference specifically to avoid biases to a great extent. As for example it is more objective to say ‘25% of classroom teaching was spent on questioning by teachers than to say ‘teacher asked questions quite often’. Because, the 25% of time can be frequently, seldom, or mostly etc. depending upon the person making judgement. Dunkerton 1981) also came to the similar conclusion that time on task is important in evaluating classroom teaching behaviours. Such words like seldom, frequently or mostly can be more meaningful and specific or exact if expressed quantitatively. Thus it appears that if the amount of time spent is available for each type of activities an agreeable standard for judgement can be established. As for example, evaluators may like to agree that if a classroom teaching period includes 50% of time in students’ doing various learning activities, 30% of time by teachers doing teaching activities, 15% of time in asking questions and 5% of time in non-teaching activities is acceptable. But this distribution of time also varies for lesson to lesson and experiences an evaluator has. So flexibility is also desired so that one could put exactly in quantitative form. Also it appears that the direction for determining the better or less better teaching is essential. In other words, if a classroom teaching learning situation which spent 60% of time in student centred teaching is better or not, in comparison to the one which included student centred activities only for 50% of total time. The following bases for gradation seems to comply with the rationales of science teaching.

1. The more the time spent on student’s learning activities the better the teaching, because the learning activities provide students with opportunities to learn by themselves under the guidance of the teacher.
2. The amount of time devoted to questioning is not too important but it is better if more time is spent on this rather than in lecturing.
3. The lesser the teacher centred act the better.

Classroom Teachig Systems can be classified into two types mainly –

A. indirect and B. direct methods.

A. Indirect methods

1. Questionnaires – researchers prepares a list of questions in writing to ask an identified person. The person being asked fills in the questionnaires. So the researchers relies on the person who is filling the questionnaires. Such techniques require that students or teachers report about what occurred in the classroom with opportunities to judge or give opinion about classroom activities depending upon the types of questions. Answers given by them are used to describe and evaluate the classroom teaching behaviours. Such techniques were used by International Association for the evaluation of Achievement (IEA) in an attempt to determine what was happening in the classrooms of countries studied. A questionnaire was sent to the science teachers containing questions as follows, as for example (Comber and Keeves 1973):

“indicate how often you give your students opportunities for planning and carrying out scientific investigations on their own” (page 276).

“Never”, “seldom”, “occasionally” and “frequently” were four choices given to respond to the questions. Some other educationists have used five response categories like “a great deal of emphasis”, “strong emphasis”, “moderate emphasis” and “little or no emphasis”, asking respondent to indicate for each item the amount of emphasis given to it in their own classroom teaching (Adams 1970, page 51). Responses to such questions were used to infer the magnitude of classroom activities occurrence.

2. Interviews

The same views apply to this technique as is for the questionnaires. The difference is that the researcher asks questions on the basis of prepares list of questions.

There is also chance that variation might occur between the answers of the interviewees and the perception of the interviewers. The questions may not be structured or exactly prepared. They may create more chances of biases (Rosenweig (1948).

Researchers using such techniques, questionnaires and interview, have faced many difficulties because the results obtained were crude. Many researchers were so dissatisfied with their interviewing procedure that they turned to direct observations (Cooper et al 1974). So serious questions can be raised concerning the validity of inferences drawn from the use of such measures. If the meaning of different terms are not defined in connection with research, would they mean the same to respondents a to researcher? As we all know that individual standard differs. The word “seldom” could mean “five times” to one person and four” time to another person and so on. So it is more unbiased to note down actual number of time activities occurred in the sequence of occurrence.

Pfau (1977) makes the remarks –

“Major difficulties are faced by researchers using such techniques cross-culturally, and serious questions can be raised about the validity of inferences drawn from the use of such measures. For example the term ‘frequently’ mean the same thing to British and to an Indian teacher, and thus are responses given by teachers actually comparable? (page 6).

Similarly, ‘strong emphasis” could mean doing the activity for an hour with constant serious supervision very carefully, or it could be doing for half an hour without supervision at all and so on. Such a way of analysis leads us to think of recording activities as often as they occur, indicating amount of time as precisely as possible and in the same sequence as they occur.

Other researchers have also encountered such difficulties by using such indirect techniques of measurement. The difficulties can be appreciated by looking at papers written by researchers who have used such techniques.

Adams (1970) reached the conclusions that:

‘the extent to which these reports are veridical (that is they reflect actual practices) is an open questions; others, however, seem to reflect pious bias that conflicts with research’. He warranted the conclusions that the response bias was operating, n other words, some countries being freer with their willingness to emphasise anything than were others (p.52).

“what our respondent reported then, might represent their values rather than their practices. The respective weight they gave to their answers were a function of the educational philosophies to which they have been exposed, tempered no doubt by the social pressure that had subsequently impinged on them.” (p58).

“the lack of difference found, of course, may not be veridical. The teachers though responding to the same word, cues may have placed different meaning on them. For example ‘free communication’ may be semantically quite different in UK and USA. Again teachers may be poor receiver of their own performance. Thus their reports do not reflect the reality of their teaching” (page 58).

“there is nothing in the study that can lead to the conclusion that these reports should provide a basis for prescribing how teaching ought to occur.(p.59)”.

In the questionnaire and interview approach of evaluation the validity also depends upon the type of questions being asked and the type of people being pooled. The questions “what are the problems of science teaching in your school?” will be answered differently by different people with different concept of science. The problems of a person who has the concept of science as demanded by the modern view of science and can do teaching as required by the rationales of science teaching will be different from a person who does not have such an exposure. Answer to that question will be reliable only when it is given to a reliable person or authentic person, as indicated earlier (Adams 1970). So it is much safer to ask questions seeking information which does not require judgement and such questions that require just reporting what was observed e.g. what are the fruits available?, what are the animals ? etc.

40. Problems encountered using questionnaires and the interviews can be found in the chapters written by many researchers e.g. Cooper et al (1974), p.27 and 37; Brislin et al (1973), Chapters 2 and 5; Warwick and Oherson (1973), chapters 6 and 9, Galton (1979), page 21 and 109 to 115.
41.
42. Informal observation
43.
44. This is also as an indirect method. But this is so open that it is entirely up to the reporter., the points to have observed are not pre-set as in the systems like questionnaires, interviews, and direct systems (Pfau 1977). Power (1977) made the following statements
45.
46. “.. it can be argued that descriptions of classroom teaching based on schedule are more reliable than those deriving from informal observations.” (page 6)
47.
48. There are also people who believe that ‘in the hands of sensitive and skilled workers informal observations becomes a powerful tool capturing the essential qualities of every day life in classroom.’ (Power 1977, p.6). he further says ‘on the other hand, the danger of distortion, reductionism and bias still lurks beneath the surface when reliance is placed on anecdotal, impressionistic records of classroom events.” (ibid.)

B. Direct methods

52. These are Methods of receiving information by being present in the classroom while teaching is in progress and classroom activities are recorded in the same order as they occur. Some of the modes used for recording classroom activities by direct methods are as follows:
53.
1. Rating of classroom teaching by observers
2. Description of classroom teaching by observers
3. Recording by sign system (descriptions by using signs – using a checklist to note down once only in a definite interval of time)
4. Using category system (recording activities as often as they occur and in the sequence of occurrence as far as possible. The classroom teaching activities that are likely to occur in the classroom teaching learning situation are grouped into categories. It is a description of classroom teaching observers by using codes for different activities).

Informal ways of direct observation for getting information can also be very useful for decision making – especially for information which can not be observable at the moment of visits. However, it could be highly biased if reported as a judgement.
Participants’ observations and description

Direct classroom observation, however, can be recorded and reported in a number of ways. On method often used by educators and anthropologists, is to observe classroom teaching and then write descriptive accounts of the teaching observed. An example of a description resulting from such an approach is the following. Reed and Reed (1968) reported teaching in Nepal like this:

“teaching technique seldom varied, the teacher might read aloud from a textbook while some children took a few notes, or the teacher or a child would chant a standard question and the class would respond by chanting a memorised answer, or a the teacher would make a statement and the class would repeat it in unison (p.135)”.

Bowker (1984) reports teaching in Nepal: “all the examples of lively, up-to-date teaching with class and group participation, were without exception the work of seminar trained teachers (p.21).”

NSSP (1982): “The teaching of science in Nepal’s schools is similar to the teaching of other subjects. It tends to be didactic, authoritarian, teacher-centred, unrelieved by the use of simple teaching or learning aids and equates memorisation with learning (p.9)”.

Such statements are based on activities that occurred during observation of classroom teaching learning situation in progress. If we can present or record the pattern as it happened, such statement can be formulated by looking at the record. The specificity and quantitative records can be used for comparison as well. Descriptive record also can create semantic misunderstanding as explained under the discussion of the questionnaires and interviews.

Participant observation of study can involve comparisons of teaching behaviours before and after training or from one person to another person’s research to show differences or changes. Such a comparative description is difficult because of the variations that occur in the presentation by different observers if the technique does not specify what should be recorded. Comparisons implies showing how much change occurs between two occasions which could be before and after a training programme. Descriptions provided depend upon the language competency of the observer and ability to observe. But the descriptions from participants’ observation is very helpful to formulate hypothesis and gather information (Pelto 1970, Dunken and Biddle 1974). The technique however, is not suitable for comparing before and after a training or results of two occasions which require to show difference or changes negatively or positively. Such comparisons need specific data quantitatively. Such a description also does not provide a very objective basis for comparing patterns of teaching from one country to another country or from one school to another, such data give only general indications of the actual extent to which specific behaviour pattern occur, that also depending upon the background of the reporter. For example, above type of description does not indicate how much time is spent by teacher in reading aloud from textbooks. Was it 50% of a class period or 20% of a class period? Furthermore, such descriptions suffer from the problem that the descriptive words employed, lack a normative base for making comparisons (Pfau 1977). That is, they lack an explicit language of comparison. Thus, as Deutscher (1973) has pointed out the standard for the subjects or behaviours may well vary from culture to culture, from nation to nation; for that matter, within any given social unit between classes, age groups, sexes and so on. “ what I ‘cold’ soup for an adult may be too ‘hot’ to give to a child (p.174). similarly, what is ‘seldom’ to one person may appear ‘frequent’ to one person may appear ‘frequent’ to another. It is ‘apparent’ that more systematic observation and recording and reporting procedures are required if precise descriptions and comparisons of classroom behaviours are to be made (Pfau 1977).
The following statements give the picture of the teaching activities but again can not be used for specific statistical comparison. Trowbridge (1974) describes his teaching experience in Nepal as follows:
“for the first few months, I would spend hours before class memorising four or five sentences which I had prepared hoping to get the point across in the most concise way. Then I would come into class, usually clutching some ‘science objects’, ostensibly to use as a visual aid or demonstration, but, in fact, just as much for my own moral support and security something to point at and name when sentences failed me. Flatteringly uttering the statements I had prepared, I hoped that some of the quicker students would catch on. Surprisingly they did, and when they figured out what I was trying to say, they would tell me clear, faultless Nepali what it was. Then I, in turn, would repeat what they had just said; and they would lean back in their chairs, content with that I had just taught them. That routine went on for a couple of months until I was able, on my won to introduce a lesson, present a problem, lead a discussion about it, explain a concept, and drive home an important point in satisfactory, if not eloquent Nepali. I depended a lot on showing things using some kind of experiment, demonstration, chart, blackboard drawing, or specimen in nearly every classroom session”.

Pfau (1977) makes the following statement:

“such descriptions, however, do not provide a very objective basis for comparing patterns of teaching from one country to another, or indeed from one school to another, since they furnish only gross indications of the actual extent to which specific behaviour pattern occurs (p.8)”.

Systematic Classroom Observation Instruments

Systematic classroom observation Instruments are generally of three types. They are ‘rating system’, ‘sign system’ and ‘category system’. They are called ‘systematic observation’ or ‘interaction analysis’.

1. Rating system –

The observers using rating system usually estimate the frequency of events and extents of attributes only once at the end of an observation session (Rosenshine and Frust 1973). As for example, from Bowker (1984):

“ 4.11 teaching”
“ the quality of one or more lessons given by a teacher was assessed independently by two observers. Each gave a mark for teaching out of 4 for content, 6 for method, and a second, impression mark for opinion. In the later mark credit was given for enthusiasm, liveliness, and degree of overcoming difficulties as well as performance. The two observers’ marks, averaged was repeated for the second science teachers where one was in post and teaching during visit.” (Bowker 1984, p.8)

“the four member of the assessment team visited in all 10 schools, giving marks according to the schedules described in the section 4.11. the average marks given to trained teachers was 7.4, and to untrained teachers 4.4. these were figures from four independent raters, awarded to 22 trained and 16 untrained teachers …” (p.21)
Bowker’s schedules (1984) used in Nepal do not provide information for making the judgement to score. Such statements can not be used for comparison with other person’s teaching. His scoring does not provide clues to what has happened in the classroom teaching.

Dunken and Biddle (1974)

“rating system call for high-inference judgements, requiring that an observer integrate whatever he has witnessed over one or more periods of observation and provide a record of general impressions ..” (p.50)

Rosenshine (1970):

“rating systems are classified as high-inference measures because they lack such specificity such as ‘clarity of presentation’, ‘helpful towards students’ or ‘enthusiasm’. Items in ‘rating instruments’ require that an observer infer these construct from a series of events. In addition, an observer must infer the frequency of such behaviours in order to record whether it occurred “constantly” or “sometimes” or “never” or whatever set of gradations are used in the scale of an observation instrument” (page 281)

‘although rating systems are no longer limited to high inference items and high inference items have been used in some category systems” (Rosenshine and Furst 1973 p.133). “the amount of inference inherent in the use of these instruments still serves as useful distinguishing feature in most cases.” ‘rating systems suffer from a number of inherent defects. As indicated before, ‘ratings’ are estimates of degree to which a person or thing possess or given characteristics.’ (Remers 1963, p.329)

These estimates are made and recorded usually once at the end of a period of observation (Pfau 1977)

Medley and Mitzel (1963) have pointed out that it is desirable that behaviours be recorded as soon after as they occur.

It is known that many factors can affect memory and may seriously distort a record made retrospect. These distorting factors result in what Kerlinger calls the ‘intrinsic defect of rating scales’, this being their proneness to constant or biased errors (Kerlinger 1973, p. 548).

‘In addition to hallo effects, which affect rating scales, Kerlinger (1973) points out other types of error often associated with rating scales. These include the error of severity, “a general tendency to rate all individuals too low in all characters”, the error of leniency, “an opposite tendency to rate too high, and the error of central tendency, a general tendency to avoid all extreme judgements and rate right down the middle of rating scale”. (Pfau 1977, p.9).

Pfau (1977) concluded as follows:

“ when different recording biases occur as they are likely to, in different ways by persons with different cultural backgrounds the utility of rating system for making cross-cultural comparisons is seriously undermined. Add to these problems the difficulty of providing operational definitions of the high-inference concepts used in most rating systems, and it can be seen that studies which call for observers to use rating system may result in judgements that are also unreliable as well as biased”(p.12).

2. Sign System

The classroom activities are recorded in a unit of time as in the Category System. As for example, in Science Teaching Observation Schedule, STOS (Eggleston et al 1976) classroom activities occurred in a unit of time of 3 minutes are recorded only once on a sheet of paper which has a list of activities. It does not provide total frequency of occurrence and the sequence of activities occurred. The amount of time spent on different classroom teaching learning activities is important to base teaching evaluation on it. The sign system is not strong on this aspect. However, the list of this system can be more detailed than in the category system thus providing details of more activities than in the category system but without the total frequency and sequence of occurrence. Exact details of all teaching learning activities can not be listed in a practical sense. The sign system, in other words, is an improved checklist. This system of recording can be very useful to check out details of activities occurred or not. Making generalisations of teaching learning can be very difficult on the basis of this type of instrument as the recorded information can not provide how classroom time was distributed in different activities. ‘sign system’ has been used much less than ‘category system’, and are less precise instrument, and does not lead to the study of sequential events in the classroom (Dunken and Biddle 1974, Pfau 1977). The ‘STOS’ has on clear theoretical background, but evolved from empirical observation of science lessons and was thus based on the intellectual process of science, which include observing, constructing, hypothesising, speculating, designing, experimenting etc. The STOS has been criticised for distorting classroom processes because of its large sample interval of three minutes (Dunkerton 1981, and Galton 1979). So this instrument may not be used in other countries, especially in developing countries, where the situation is different from the developed countries where the tool is developed. This type of tool is place based so not universal so not scientific.

Ajeyalemi and Maskill (1982)

‘Another feature of science classroom in developing countries is the language of instruction. Often English is the only medium of instruction in classrooms and is thus ‘foreign’ to both teacher and pupil. Some of the problems of learning and teaching science through such a foreign medium have been enumerated by Stevens (1976). Classroom research in mother tongue English situations have shown that there are conceptual gaps due to language difficulties between the teacher and pupils in the realisations of subject matter, especially in science subjects (Cassels and Johnstone 1977) with their technical or specialist ‘registers’ (Barnes (1969). How much more could this be the case in science classrooms where both teachers and pupils do not have English as their first language, and where a teacher might even come from a different language background from those of his pupils?”

“Another problem that exists is associated with research methodology of classroom observation. By the majority of studies reported of science classrooms used one or another of the systematic structured methods such as STOS (ibid.) with pre-defined categories of behaviour to be observed. These research tools have been developed and validated in particular classroom situations, usually those of Western European or North American classrooms (Hacker et al 1979). Observers look for and take note of ‘science’ activities as seen through Western eyes. May be if model had been developed within the culture of a developing nation very different categories of science behaviour would have been specified. Thus it may well be that not only the results of research done in the West are unusable when reported but also that the research tools themselves are inappropriate outside the countries they were developed in (p.261).”

So it is wise to develop own system.

It seems that the observation tool selected needs to be based on the rationale for science teaching, but not on a particular situation to give to the observation tool more content validity; and be appropriate for the cultural context.

3. Category system (Descriptions by Coding the Categories of Classroom Activities)

Observers using category systems keep a record of specific events each time as they occur or at every frequent interval (e.g. every 3 or 5 seconds). (It is very difficult to record faster than every 3 seconds. It is obvious that the shorter the interval of time for recording the classroom activities the more is the details of the classroom activities are recorded).

“ .. observers using category system must make specific observation while teaching learning activities are in progress in a classroom (Dunkin and Biddle 1974, p.60)”.

The judgement for opinion made on the basis of observed records in category system are available for any one who wishes to make his own interpretation and evaluate the teaching. This gives a freedom to specify for the variations that are likely to occur due to individual differences of beholders.

In the past rating systems are distinguished from category system mostly by the amount of inference inherent in their use and in the interpretation of their results. As Rosenshine (1970) has pointed out:

“category systems are classified as low inference measures because the item focuses upon specific, denotable relatively objective behaviours such as ‘teacher repetition of students ideas or teacher asks evaluative questions’ and because these events are recorded as frequently as they occur (p. 281).”

Logical suitability of category system

It appears that the essential point in observing teaching is to gather information on what has happened actually in the classroom while teaching but not make judgements of form opinions at the time of recording. One can use sign or category systems to determine how much time is spent on different activities by teachers and students. This specification in relation to time is available only in ‘sign system’ and ‘category system’. But only category system provides total frequency as much as possible in practical sense and sequence of occurrence. Data obtained from both type of instruments can be used for comparative studies as they are specific and have a standard language. To clarify the notion of standard language Przeworsky and Teune (1970) have pointed out that:

‘whether two or more phenomenon are “comparable” depends on whether their properties have been expressed in a standard language. A language of measurement defines classes of phenomenon by providing specific criteria for deciding whether an observation can be assigned to particular class … it is a standard language if it can be consistently applied to all individuals or social units … classifying observations into categories, ranking them, or counting instances serve to express observations in a language of measurement … if these observations are expressed in a standard language, they are indeed comparable (p.93).”

The notion for standard of effectiveness varies from individual to individual, so an opinion given is not expressed in a standard language and thus it is not comparable. Pfau (1977) says

‘category systems’ usually do provide specific and low inference criteria for deciding whether certain classifications of behaviours have occurred or not. Based upon the frequency of occurrence, the extent of classification of these behaviours can be stated in using a standard language. Therefore, category systems do appear suitable for comparisons (p.13).”

Dunkerton (1981) also has suggested the use of quantitative type of classroom observation schedule as it seems is not possible to link classroom teaching behaviours and teacher effectiveness without quantitative recording of classroom behaviours. A structured method helps the user to do this (Nash 1973, Kariacow 1983, Basey and Nina 1979).

Conclusion

Thus on the basis of the above analysis a category system for classroom teaching observation can be adopted for recording classroom teaching activities during observation and evaluation of classroom teaching. Kariacow 1983, Cooper et al 1974 also seem to agree that a category system is more useful for analysis and improving teaching than any other observation system. Probably, it is the best observation system that can be recommended at the moment.

Therefore own classroom teaching observation tool needs to be developed by considering the points dicussed above so that is valid to the rationales of teaching and reliable, that is, therecords are not diferent saignificant between observers using the same tool. The mehtod of evaluation suggested should be unbiased - should be based on the quantification.
A System of Classroom Teaching Observation and Evaluation.

Introduction

Evaluation of classroom teaching requires observation of the teaching in the real classroom situation. This demands a classroom teaching observation system. The evaluation of teaching ought to be based on the actual classroom teaching observation done. Obviously, the teaching observation records must reflect the activities occurred in the classroom during teaching.

Usually the references for judging the classroom teaching are the rationales of teaching and learning. So a classroom teaching evaluation system is meaningful only when it is based on the rationales of teaching and learning.

The rationales are the general guidelines recommended by the teaching technologists. The rationales teaching demand certain activities to have done during classroom teaching.

Thus, the system of teaching observation and evaluation is valid only if it tallies with the recommended teaching learning activities. This is essential to give an acceptable content validity for the tool. The evaluation system used in a training programme largely reflects the nature of the training programme itself. Learners learn the way they are tested. The system of evaluation significantly directs the style of learning. A training that evaluates teaching in the direction of the rationales of teaching and learning means the training activities are directed towards the same principle. Indeed, it has to be that way. Thus, the evaluation system based on the rationales of teaching and learning means that the training programme is also guided by the same philosophy.

Learning activities

There are certain learning activities recommended by the teaching experts. It is obvious that the more the students are involved in those activities the better the learning of the students is. The priorities given to those learning activities are different for different subjects. Some of the learning activities that help for learning are as follows:

1. Doing experiments or practical.
2. Making or construction of materials
3. Fitting or fixing up the materials.
4. Doing demonstrations.
5. reading / writing or study works
6. Speaking or asking questions and giving answers.
7. Drilling, doing workbook exercises or examples.
8. Verifications of results or answers.
9. Discussions.
10. Listening.
11. Role playing or dramatising.
12. Observations, field-trips or project works.

Teaching activities

Similarly, some of the activities that can assist in learning are teacher's demonstrations, lectures, questioning, directions, guidance and help, teachers' answers, writing on black board, tests, checking notebooks of class works or home works, correcting the students' written works etc.

Criteria for the Development of the tool

Those activities listed above occur in classroom teaching one way or other. Time given in different activities may vary depending upon the teaching styles. As for example, the lecturing or teacher speaking dominates general teaching style in developing countries like Nepal. It is well accepted that such lecturing is least helpful for learning. On the other hand, it is also well-established notion that learners learn the best when involved in doing activities. Seeing the learning activities help better to learn than just listening. Therefore, it is just a logical requirement for an observation system that it provides the total time spent on different activities. Based on the theme that the more the time spent on the learning activities the better is classroom teaching management, a standard for evaluation can be fixed. Therefore, how much of minimum time should have spent on students doing activities during classroom teaching needs to have fixed. The standard of measurement may vary or can be varied. Some may like to fix the standard at 5o% of classroom teaching time for students' learning activities, some may like 60% and so on. The best is of course to spend 100% of time in learning activities. This type of quantification will make the judgement unbiased. Because it is based on the calculation but not on the criteria of one's own opinion rating.

Another criterion for the teaching evaluation system is its reliability. That is, the system of observation when used by another or a second person, the results of scores should be similar more or less. Statisticians do not desire the differences of more than 15 %. In other words, agreement between the observations done should be 85% AT LEAST, of course, the more the better. There is a method in statistics to calculate the inter-observers' agreement. It is known that more the number of parameters higher the reliability is. The measurement becomes increasingly specific with addition of parameters. Therefore, the statisticians recommend at least 10 parameters or size of the pool for a better chance of reliability.

It is also a well-established notion that if evaluation criteria are known the learners can accomplish objectives much more satisfactorily than by those who do not know the objectives or criteria or what to achieve. Therefore, the teaching evaluation system should be helpful to learn about teaching and thus improve teaching styles. This is possible only when the evaluation criteria for teaching performance are based on the rationales of teaching, specific and clear about what is considered as good teaching.

Since the observation system needs to reflect classroom-teaching activities occurred, the sequence of activities occurred may be demanded. The sequence of activities occurred are also important in learning.

Opinion based evaluation system is biased. Opinion rating usually is not agreed upon for a fair evaluation. There is a lot of room for manipulation. A second person would not know the bases of judgement in opinion rating.

Categorising the Teaching Learning Activities

During observation of classroom teaching, it is very difficult to record classroom activities occurred in words and indicate time taken by those activities. One way to simplify the recording of activities is by coding the activities. Code numbers can be recorded in a definite interval of time as the corresponding activities occur. Recording code numbers is easier, faster and specific. This way it is possible to calculate time taken by those activities. It also indicates the sequence of activities occurred.

Code numbers of too many activities is difficult to memorise for efficient use. The grouping together or categorising similar activities makes the list short. Up to 15 categories may be used in practical sense.

Flander's Interaction Analysis Categories uses 10 categories. Recording interval for verbal activities is 3 seconds. Recording in a longer interval of time definitely will make the recording easier.

Some rules are essential to use the observation tool accurately. Different situations like the following require rules:

- "When 2 or more activities occur within the recording interval of time, or when it is difficult to calculate time taken by each activity or some activities occurred that are not corresponding or can not decide to refer to categories at the time of observations".

Categories

More or less all the teaching/learning activities that occur during classroom teaching are included in the following 11 categories or groups. All the activities correspond to the rationales of teaching and learning. So it is valid. Rearrangement until satisfaction can be done through rigorous practices by observations. It has 11 categories. Inter-observers agreement can be as high as 95%, or even more.

Student's activities

1. practical - experiments, playing games, role play, dram etc.
2. making or constructions - materials, charts, collections etc.
3. fixing / fitting - apparatus, materials, blocks etc.
4. demonstrations - presentation of any works by them.
5. Library works - an study / writing works on their own (not copying)
6. Speaking - answering questions or asking questions or when students speak out, reading out or drilling in language.

Both teacher's and student's

7. Teacher questioning - teacher asks questions.

Teacher's activities

8. Workbook exercises - question answer written exercises as just repetitions.
9. Teacher demonstrations - experiments or any materials or activities.
10. Teacher lecture - when teacher explains, read out or student read out for teacher for information.

Non-teaching

11. When non-teaching activities occur - role call, notice or some sort of other disturbances occur.

Observation procedure for recording classroom activities

Each category code is noted down in a definite interval of time as frequently as possible during classroom teaching observation. 5 seconds seem reasonable. (My experience is that longer interval is not necessary and shorter interval makes recording difficult).
At the end of the observation, a series of numbers is obtained. The series provides the sequence of activities occurred. Totalling each code number gives the total time spent in each type of activities.

Evaluation system

The recommendation of the teaching experts that the more the time spent on students doing activities the better the teaching is accepted widely. In other words, teaching that gives more time for students doing activities is valued as better teaching than that gives less time. Based on this theme, some lines of standards can be fixed. Based on the above teaching observation system, the following two ways of scoring the teaching are suggested.

1. It may be said that at least 50% of class teaching time is essential for acceptable quality of teaching. (thus it may be defined that a classroom teaching that uses at least 50% of classroom time is student-centred). So the scoring system may be suggested as follows:

Classroom teaching time % score % comments

50 to 50 satisfactory/reasonable
60 to 70 very good
70 to 90 distinction
80 to 100 excellent
more than 80 --- extraordinary

OR,

2. It may be more scientific to give scores on the basis of time ratio between time spent on students' learning activities and time spent on teachers' activities. Because, sometimes many other activities occur during classroom teaching time like role calls, notice, disturbances etc. (or the percentage of time may be calculated out of the total time used in the student's and teacher's activities).

The calculations of time ratio can be done as follows.

Time ratio = total time given for student's activities divided by the total time given for the teacher's activities.

Scoring

Time ratio score % comments

1    50     pass/good
2     70 very good
3     90      excellent
4     100    extra-ordinary

Notes:

Observers should make some ground rules and agree upon, at the end of the observations a note may be taken, undecided activities may be noted down and decide later on. Project works are noted down indicating time given or taken. And, so on.

Classroom Observation and Evaluation System
ACI (Activity Category Instrument)
Harrie E. Caldwel 1967

Student Centred Activities. (1 to 6)

1. Laboratory Experiences: open-ended.

Students are presented a problem to be solved by experimentation. The procedure may or may not be given. They are required to make observations and analyse or interpret their findings.

2. Laboratory experiences

Students are presented a laboratory experiment with a structured procedure. They are not required to analyse or interpret their data. They are asked to make observations.

3. Group projects

One or more groups of students are working on a science subject during the class period. Some may work individually (not written projects).

4. Student demonstrations

A student or a group of students demonstrate a science experiment or project which they have prepared (oral report on science project would be included).

5. Student Library research

a. A student or a group of students give an oral report they have prepared based on reference materials.

b. The class works with reference materials for purposes of writing or making of reports.

6. Student Speaking

The students contribute verbally by asking questions, answering questions or simply volunteering information.

7. Teacher Questioning

The teacher asks students questions.
Teacher Centred Activities (8 to 10).
8. Work book work

Students work in class on workbooks, homework, questions from text, art type works etc.

9. Teacher demonstrations

The teacher presents materials by films, filmstrips, record, television, radio, demonstrations. etc.

10. Lecture

The teacher reads aloud, expresses his views, gives directions, makes an assignment, or asks rhetorical questions. Students are expected to listen. They may interrupt only when they do not understand. Student reading just for information in the text is also included in this category.

11. General Havoc

The students may be cleaning up, settling down or doing nothing. In general this category should be used sparingly when non-teaching activities occur.

The categories

The major difference between open-ended laboratory experience (category no.1) and structured lab experiences is amount of freedom granted to students. In a structured lab experiment the students have little freedom for investigation or expression or it is not required. Procedures are explicitly described. The students are not asked to analyse or interpret data, they are only required to make observations. In open-ended laboratory experiences students are presented one or more problems. They may or may not be required to determine a procedure, but are asked to analyse or interpret results.

In elementary school science classes children may work individually or in small groups, demonstration or reports. The category entitled “group projects” (cat no. 3) describes classroom situation when children are preparing science fair projects or demonstration for class. When a demonstration, project or oral report on project is being presented by a student, classroom activity is best described by the category “student demonstration” (category 4). When studying animals many teachers have students choose an animal, look information about the animal and prepare an oral or written report on the animal. Student library research category (no. 5) describes classroom activity when students are working with reference materials to prepare report or when students are presenting their reports to the rest of the class.

The “student talking” category (no 6) describes classroom situations when students are making verbal contributions to the class. They may be expressing views, reading notes aloud, telling stories, describing personal experiences, asking questions or answering questions.

“Teacher questioning” category (no.7) describes classroom activity when a teacher is attempting to elicit student talk by a question orally. It is not classified teacher or student activity.
The work book work category no. 8 describes classroom activity when students are filling out workbooks, working homework problems, writing answer to questions, copying pictures or colouring pictures. However category no. 8 does not describe classroom activity when students are writing results of a laboratory experience.

The technology has produced instructional aids, audiovisual device and other media that provides teachers with effective and efficient ways to present information. Included are television, films, and filmstrips, photograph records, tape recordings and radio. Category no.9, teacher demonstrations, describes classroom situation when teacher is using instructional aids. This category also describes classroom activity when teacher is performing demonstration. For example, when a teacher is heating a coke bottle capped by balloon to demonstrate that air expands when heated, the classroom situation is best-classified teacher demonstrations.

The category entitled lecture (no 10) describes those activities in which students receive only information or directions from the teacher or textbook. It describes situation when a teacher reads aloud, expresses his views, gives directions, makes an assignment or asks rhetorical questions. Students listen and interrupt only if they do not understand what the teacher is saying. If a teacher writes on the blackboard or overhead projector, classroom activity is still classified as lecture, these instructional aids are not considered teacher demonstration. Students reading the text, aloud or silently for just information or knowledge, and lull during a lecture for students to write notes are also considered as lecture.

General havoc (no. 11) describes intervals of time when the class is interrupted or when nonteaching activities occur. Non-teaching activities would include students settling down or cleaning up, announcements on the public address system, visitors at the door, fire drills, handling out home work or materials, moving from a classroom to a another location, etc. Periods of silence while the teachers get materials ready, erases the board or does some other menial tasks are classified general havoc.

Use of activity categories

Activity Category Instrument is designed for use by an observer watching a science class teaching a lesson either live or recorded on videotape. Every five seconds the observer records a numerical to designate the category which best described the classroom activity during the fivesecond interval. For example, if the teacher is asking a question, classroom activity is best described by the category entitled teacher questioning and the observer records a 7. If students were cleaning up after a laboratory experience, the observer records an 11, general havoc. The recording of series of numerals is obtained. This series of numerals indicates which types of activities occurred, how often they occurred, and the sequence they occurred.

Many time two or more activities occur simultaneously or sequentially in the same five-second interval. When activities occur simultaneously or sequentially in the five-second interval, observers classify all activities into their respective categories and record the numeral for the category with the numerical designation closest to or the category of smallest number. For example, the teacher is talking during a film, if the teacher is telling or lecturing students, category no.10 the observer records a 9 teacher demonstration and not a 10. If the teacher is asking a question during a demonstration, the observer records a 7 teacher questioning. When activities occur sequentially in the same five-second interval, observer must decide which type of activity occupied the major portion of the interval and only the numeral of this activity is recorded. In some instances observers can not determine which type of activity occupy the major portion of time because all types of activities occupied the equal portions. In this case the observer chooses the type for the category with the lowest numerical designation and records this numeral. For example a teacher stops lecturing in an interval and student begins speaking. If the teacher’s speaking occupies the greatest portion of the five-second interval the observer records a 10  lecture . If both activities occupy the same portion of the time interval, the observer chooses the activity belonging to the category entitled student speaking because its numerical designation is lowest no. 6. He records a 6.
Fieldtrips is classified student library research, category no. 5 field trips may range from a trip to another classroom where an exhibit is displayed to a trip to an industrial concern in another city. The travel time is not categorized as no.5 but as general “havoc” category no.11. Tests are not classified. The observer codes a T to denote test. Guest speaker is written across the blank intervals. Pp. 5659.

Modification (suggested - it may be changed as desired by the users):

Keeping the same system of recording, the tool may be modified the following way

I. Student centred activities (1 to 6)
Category 1. Experiment closed  open.
Category 2. Making education materials.
Category 3. Fixing or fitting up materials.
Category 4. Student demonstrations
Category 5. Writing  library.
Category 6. Student speaking.
II. Category 7. Teacher question.
III. Teacher centred activities
Category 8. Exercise works/ workbook works.
Category 9. Teacher Demonstration.
Category 10. Teacher lecture.
IV. Category 11. Silence  nonteaching activities.
Scoring Activity (time) Ratio = time spent on student activities divided by the time spent on teacher activities
Activity Ratio marks %
1       50
2      60
3       70
4       80
5       90
6       100
7 or more extra-ordinary.
N.B.
a. 1.1 will be 51%, 1.9 equals 59%, similarly other scores may be given in decimals. i.e. 4.1 will be 81% and 5.9 will be 99% and so on.
b. Project works may be noted as project works. This activity takes a long time.
c. Marks vs. scores may be fixed as desired.
d. Workbook exercises could be considered as student centred activities if they are new and done independently.

Dr. Dev Bahadur Dongol

References
ACI

Adams, D. (1970); Education and Modernization, Mass., Addison – Wesley.

Adams, R.S. (1970); Perceived Teaching styles, Comparative Education Review 14(1)

Ajeyalemi, D.A. and Maskil, R. (1982); “A survey of studies of science classroom activities with particular reference to their usefulness for developing countries”; European Journal of Science Education vol.4/ no.3, pp.253-263.

Bassey, Michael and Hatch, Nina (1979); “Interaction Analysis for Infant teachers”; Educational Research, vol.21/ no.2/ February.

Bowker, Mike. (1984); Science Education in KHARDEP area, British Council, London.

Caldwel, H.E. (1967); Evaluation of an In-service Method Course by System Observation of Classroom Activities, Project No. 6-8760, US Department of Health Education and Welfare, Office of Education.(ERIC Document Reproduction Service No. Ed. 024 615).

Caldwel, H.E. (1968); Evaluation of an In-service Method Course by System Observation of Classroom Activities, Syracuse University, Ph.D.

Caldel, H.E. (1971); “Activity Categories: A Quantitative Model for Planning and Evaluating Science Lesson” in School Science and Mathematics. Vol. 71(1), January. Pp.55-63.

Cooper, L.C. and Keeves, J.P. (1973); Science Education in Nineteen Countries, International Studies in Evaluation; A Halsted Press Book, New York, London, Sydney, Toronto.

Deutscher, I. (1973); “Asking Questions Crossculturally: some problems of linguistic compatibility” in D. P. Warwick and Oherson (editors), Comparative Research Methods. Englewood Cliff: Princeton-Hall.

Dunkerton, John.(1981); “Should Classroom Observation be Quantitative?”; Educational Research. Vol.23 no.2, February.

Dunkin, M. J. and Biddle, B.J. (1974); The Study of Teaching, New York, Holt, Rhinehart and Winston.

Eggleston, J.F., Galton, M. and Johns, J.E.(1976); Process and Product of Science Teaching, School Council Research Project: MacMillan Education.

Galton, M.J. (1979); “Systematic Classroom Observation”; British Educational Research. 21, 109-115.

Kerlinger, F.N.(2973): Foundation of Behaviour Research. New York: Holt, Rhinehart and Winston.

Kyriacow, Chris. (1983); Research on Teacher Effectiveness in British Secondary Schools; British Educational Research Journal. vol.9 no.1.

Nash, R. (1973): Classroom Observed, London: Rontledge and Kegan Paul.

Pelto, P.J. (1970); Anthropological Research: The Structure of Inquiry. New York: Harper & Row.

Pfau, Richard Henry (1977): A Cross-national comparison of classroom Behaviours: based upon a survey conducted within Nepal using ‘Flander’s Interaction Analysis, University Microfilm Internation, 300n ZEEB Road, Ann Arbor, M148 106,18 Bedford Road, London WC1R 4EJ, England.

Power, C.N.(1977); A Critical Review of Science Classroom Interaction Studies, Studies in Science Education vol.4,pp. 1-30.

Przeworski, A.J. & Teune, H. (1970); The Logic of comparative Social Iinquiry. New York: John Wiley.

Reed, Horace B. & Reed, Mary J. (1968); Nepal in Transition, University of Pittsburg, Studies in Comparative Education,No.7.

Remers, H.H. (1963); Rating Methods in Research on Teaching in N.L.Gage (editor) Hand Book of Research on Teaching. Chicago, Rand MacNally.

Rosenshine, B. & Furst,N.(1973); “The Use of Direct Observation to Study Teaching” in M.W.Travers(editor), Second Handbook of Research on Teaching, Chicago: Rand MacNally.

Rosenshine, B.(1970);”Evaluation of Classroom Instruction” in Review of Educational Research. 40(2),pp.279-300.

Rosenwieg, S.(1948); “Investigating and Appraising Personality” in T.G.Andrews’ Methods of Psychology. New York: Wiley.

MY PhD THESIS: RADICALIZATION OF SCIENCE EDUCATION

Thursday, October 6, 2011

Chapter IIIc: Evaluating Classroom Teaching

No comments:

Post a Comment