Go to Top

Validating Assessments

download_pdf_smOnce classroom assessments have been written, they need to be validated – that is, teachers need to administer the assessments, determine their effectiveness, and correct any problems that surface. A thorough way to do this is to conduct an item analysis. That process involves looking at every test item, recording data, and then interpreting the data. Asking the questions to create data is actually the easier part. Sample questions include:

  • What percentage of students incorrectly responded to the item?
  • Were their “mistakes” similar in nature?
  • Did students miss the same types of items throughout the assessment?

Interpreting the data is more difficult. Were the errors the result of:

  • the curriculum, i.e. expectations are not clearly stated in the first place?
  • the instruction, i.e. teacher strategies were not effective?
  • the assessment, i.e. questions/tasks are confusing?
  • the students’ lack of preparation, i.e. not completing assignments or practicing a skill?

Item analysis is often a task completed through a group effort, such as by members of a professional learning community, or a grade-level team or department. It is an excellent tool; however, it is time-consuming and therefore often completed for only sample assessments. In the meantime, what could an individual teacher do, with each new assessment, to determine whether the assessment itself is effective or in need of revision? Here are a few suggestions.

  1. In trying to determine if an assessment is well written, go directly to the “consumers,” otherwise known as the students themselves. Create a short, one- or two-question handout to distribute after the assessment has been administered and scored. In a paper-pencil test (selected response or constructed response), ask if there were any test questions that were confusing to students, and if so, to explain. Similarly, if the assessment is product or performance, ask if there was anything confusing about the task (the description of what students were to do) or the rubric used for the evaluation. Yes, at some point you’re bound to get a reply from an unprepared student who will try to shift the blame from himself to the assessment, but that should be easy to discern, since either no one else will register the same complaint, or the explanation won’t ring true. However, if several students all comment on the same test item, task description, or rubric descriptions, that should raise a “red flag.” You should consider rewording for clarity.
  2. For selected or constructed response assessments, create a spreadsheet or chart listing assessment item numbers. As you evaluate the papers, keep a simple tally of incorrect responses next to each item number. When finished, ask yourself: What percentage of students gave incorrect answers to a specific item? If the percentage for any item is high, look at the item objectively to see if the wording might be unclear. Then note what students said about it on the handouts described in #1. If students also listed that particular item as being confusing, then the wording of that item most likely should be changed.

If using a rubric to evaluate student achievement, ask yourself: At any point in the evaluation process, did you have difficulty choosing between two different ratings? If so, perhaps your rubric descriptions are not distinct enough.

If the assessment is product or performance, did multiple students (any number greater than one) do something quite different from what you expected? If so, look at the task directions objectively. Could those directions be interpreted to lead students in the path they took rather than the one you expected? Once again, note what students said about it on the handouts described in #1. You may need to change the wording so that the description of what students are expected to do is more specific and detailed, to avoid confusion or misinterpretation.

We all know that if students perform poorly on an assessment, it may be due to the other conditions noted at the beginning of this E-Hint, such as ineffective instructional strategies or just poor preparation on the part of students. But when an assessment is new, it is important to validate it, to be sure that the wording, use of graphics, or scoring interpretations are not the cause of the problem. The four suggestions above are not as thorough as a true item analysis, but they do provide a good general guideline for checking the quality of your new assessments. Keep in mind that these validation techniques should be used, and necessary revisions made, the first year the assessments are administered. After that, no changes should be made until the target subject comes up again in the curriculum cycle. If you keep making changes to the assessment every year, then your data would not be valid for comparison purposes.