Standards-Based Grading in AP World History

Standards-based grading and assessment is difficult even without the pressures that go along with teaching any of the AP curriculum. When combined, even seasoned AP teachers can struggle to find a way to align the standards-based philosophy and logistical requirements with their system of delivering both skills and content.

The combination of both skills and content make for one of the first hurdles. Historical content is incredibly important to success in any AP class, however, it can’t be spiraled throughout the units. This makes constant reassessment nearly impossible and obfuscates the significance of historical thinking skills (or reasoning processes per College Board language). Proficiency scales that are written around content confuse the forest for the trees. They embed major historical thinking skills as the defining differences between levels of proficiency, but either hide them as secondary characteristics or conflate multiple skills in ways that are counterproductive. Proficiency scales need to be skill-based with the understanding that skills cannot be shown without a firm grasp of content. Teachers then rightfully ask, how does content get assessed? How are students held accountable for knowing and understanding content?

One possible solution is to create a measurement topic for content itself. The student’s percentage score can be converted to a standards-based score according to the Marzano, or another similar, scale. This works well with multiple choice questions of similar difficulty. Although AP World MCQs are stimulus-based and embedded with historical thinking skills, this makes it possible to assess content for the sake of content. Content knowledge becomes a standard in and of itself, equally weighted to other prioritized standards.

One lesson I remember learning the hard way was to not prioritize too many standards. In my non-AP history courses, 12-15 standards was the most I could feasibly handle while ensuring multiple assessment opportunities with even distribution across units of study. There are far too many individual standards and sub-standards in the AP curriculum to reasonably assess them all multiple times. Grouping and prioritizing was how I solved this problem. Many of what College Board calls “Historical Thinking Skills” such as primary source analysis, explaining historical developments and processes, or making connections are done constantly as a part of “big tent” skills. I decided to create proficiency scales for the three main reasoning processes plus contextualization. These make up the next four measurement topics in addition to content.

The last measurement topic is designed with the DBQ and LEQ in mind. Yes, it is possible to assess the DBQ and LEQ skills through other measurement topics, but I did not want to minimize their importance. Using my subjective scale, student scores on both essays can be easily converted to standards-based scores. They can then be easily averaged and weighted in a standards-based grade book. The downside is that the individual skill behind one point becomes less visible than the aggregate score. However, students are still forced to reckon with the more complex points in order to reach the As and Bs they want. These expectations maintain rigor.

Having only six measurement topics (or “standards”) ensures that students have multiple assessment opportunities on critical skills and tasks, honoring the growth over time model embedded in SBG philosophy. There are many smaller skills and tasks that are not explicitly stated in these proficiency scales. However, they are embedded in my lessons, activities, and formatives.

Summative assessments can be larger unit exams that cover content only, or more-focused skills-based assessments organized according to levels of proficiency. I like that this forces me to think about assessing skill more explicitly than the traditional unit exam structure might. It also requires a mindset shift, moving away from summatives that take an entire block to smaller and quicker assessment opportunities.

This is not a perfect system. The way I have aligned DBQ and LEQ scores to a standards-based score are open for debate. Other teachers may justifiably argue that primary source analysis deserves its own standard. This is just an attempt to make standards-based grading and assessment work in an AP context. I favor a simple system that protects the integrity of both skill and content while avoiding the dangers of over-engineering grading protocols.

