How Do I Create Tests for my Students?

Prepared by Mekiva Callahan & Micah M. Logan                                         See the PDF version

 

Introduction

 Assessments are essential to the learning process. As the class instructor, you need a means of gathering information on the effectiveness of your instruction and a way to measure your students’ mastery of the course’s educational outcomes. For students, assessment provides them with feedback on their learning and can also be an incentive for improving academic performance. From an administrative perspective, the cumulative value of assessments is tangible data suggestive of student achievement. Perhaps the most well known form of assessment is a test or an exam, so given the high stakes of evaluation, from a variety of perspectives, and the importance of accurately gauging students’ learning,  it’s imperative that you design valid, well-written tests.

Test Composition and Design: Where to Start?

Identify Course Goals
It might seem obvious, but one of the most important steps of test composition is to revisit your overall goals and objectives for the course and to determine which goals you intend to evaluate with this test, bearing in mind that a formal test or exam is not always the best way to evaluate the desired learning outcomes.  Once you have identified which outcomes what you want to measure, consider what type of question or prompt best facilitates the students’ production of that outcome.  For instance, is it your intention to create a test asking students to recall definitions or are you interested in having students demonstrate their ability to compare various concepts and defend their position on a controversial subject?

Many people utilize Bloom’s Taxonomy, a hierarchical structure of thinking skills, as a tool for gauging the cognitive depth of student learning.  Figure 1 depicts a more recent adaptation of Bloom’s Taxonomy and it can be useful to keep in mind when constructing tests.  Consider, for instance, the difference between the first test or assignments of the semester when you simply want to measure the students’ ability to understand or recall new information, and the final exam in which you ask students to independently analyze data or situations or possibly create a project or document representative of the information covered throughout the semester.  The required skills or desired level of cognition will vary based on the educational objectives for each exam, so it is vital that you keep your pre-determined goals and objectives in mind throughout the test composition progress.

For additional information on Bloom’s Taxonomy and sample questions for each level of cognition, please see the additional resources provided.

Determine Test Structure/Design
Much like learning styles, research shows that many students have a preferred test format, so in order to appeal to as many students as possible you might consider drawing from a variety of testing methods or styles.  In fact, you can design a single exam to include several kinds of questions and measure a range of cognitive skills.  Some common types of tests and test items are discussed below.

For further information on writing effective test questions, please contact the TLPDC for a consultation or refer to the online resources provided below.

 When grading subjective tests or test items, the use of an established set of scoring criteria or a well-developed rubric helps to level the playing field and increase the test’s reliability.  For more information on rubric development, please see the additional online resources provided.

Table II contains a chart showing advantages and disadvantages for a selection of test items. It’s important to note that this is not an exhaustive list, and remember that as the course instructor, you have the freedom to choose what form of assessment most aptly measures your specific learning objective.

Table II: Advantages and Disadvantages of Commonly Used Types of Achievement Test Items


Type of  Item

Advantages

Disadvantages

True-False

Many items can be administered in a relatively short time. Moderately easy to write and easily scored.

Limited primarily to testing knowledge of information. Easy to guess correctly on many items, even if material has not been mastered.

Multiple Choice

Can be used to assess a broad range of content in a brief period. Skillfully written items can be measure higher order cognitive skills. Can be scored quickly.

Difficult and time consuming to write good items. Possible to assess higher order cognitive skills, but most items assess only knowledge. Some correct answers can be guesses.

Matching

Items can be written quickly. A broad range of content can be assessed. Scoring can be done efficiently.

Higher order cognitive skills difficult to assess.

Short Answer or Completion

Many can be administered in a brief amount of time. Relatively efficient to score. Moderately easy to write items.

Difficult to identify defensible criteria for correct answers. Limited to questions that can be answered or completed in a few words.

Essay

Can be used to measure higher order cognitive skills. Easy to write questions. Difficult for respondent to get correct answer by guessing.

Time consuming to administer and score. Difficult to identify reliable criteria for scoring. Only a limited range of content can be sampled during any one testing period.

SOURCE: Table 10.1 of Worthen, et al., 1993, p. 261.

Test Composition and Design: Additional Considerations

Validity & Reliability

Two key characteristics of any form of assessment are validity and reliability.  As Atherton (2010) states, “a valid form of assessment is one which measures what it is supposed to measure,” whereas reliable assessments are those which “will produce the same results on re-test, and will produce similar results with a similar cohort of students, so it is consistent in its methods and criteria.”  These attributes provide students with the assurance they need to know that the test they are being given is fair and reflective of what has been covered in the course. 

To establish a valid test instrument it is important always be mindful of your pre-determined learning outcomes and goals.  This mindfulness will help to ensure that each question you develop is an accurate measure of the specified learning outcome.  An example of an invalid question is one which tests a student’s ability to recall facts when it was actually intended to assess a student’s ability to analyze information.

As Atherton (2010) describes it, another way to think of reliability is in terms of “replicability.”  Is there a general consistency in students’ overall performance on an exam?  If the exam is given to more than one class or over the course of multiple semesters, is there consistency between the various classes?  If so, the test is considered to be reliable.  Strategies such as writing detailed test questions or prompts, including clear directions, and establishing and communicating clear grading criteria will increase test reliability.  

Verifying validity and reliability in a written test can be challenging.  For instance, in the grading of a writing test, what exactly are you trying to measure?  The students’ writing abilities, content knowledge, all of the above?  Be sure that all of your expectations for an exam are communicated to your students well in advance and always be sure that your expectations mirror those articulated in the overall course goals.  Another way to increase test validity and reliability is to reexamine and possibly remove questions missed by a large majority of students.  If a significant percentage misses the same question, there is a definite possibility that the question was somehow unclear or was not representative of the intended learning outcome.

Test Length:
Another important aspect of test composition is time management—on the part of the professor as well as the student.  A common student complaint with tests is that the test was covered material never covered in class or had too many questions on something that was covered in only a few minutes.  When designing tests it is helpful to remember that topics on which you spent a significant amount of class time, through instruction and activities, should be appropriately emphasized on the test.   This does not mean that you should not include items that received less coverage in class, just be sure to maintain an appropriate balance.

Also, bear in mind that it will take students longer to complete the test than it would you.  In his highly referenced book Teaching Tips (1994), Bill McKeachie outlines the following as a strategy for determining test length, “I allow about a minute per item for multiple-choice or fill-in-the-blank items, two minutes per short-answer question requiring more than a sentence answer, ten or fifteen minutes for a limited essay question, and a half-hour to an hour for a broader question requiring more than a page or two to answer.”

Conclusion

Tests and exams often play a significant role in the overall assessment of students’ learning.  Therefore, as instructors, it essential that we pay particular attention to the manner in which we construct these instruments.  Remember to always keep your course goals and learning objectives at the forefront of your mind as you begin to determine what kind of test is the best measure of your students’ learning.  To that end, if it fits with your course design and content, you may want consider alternate forms of assessment such as group projects, student portfolios or other activities that extend and build throughout the course of the semester.  These alternative or non-traditional forms of assessment frequently offer students a more authentic opportunity to apply their knowledge and higher-order thinking skills.  

Creating tests and other forms of assessment can be a challenging task, but there are plenty of resources available to you.  If you would like assistance with test composition, or if you have questions about assessment in general, please contact the TLPDC for a consultation.