Formative Assessment with Dylan Wiliam

Presentation : 1 hour

Q&A : 30 minutes 

SSAT’s Embedding Formative Assessment programme 

Dylan Wiliam PhD

Emeritus Professor of educational assessment at University College London. 

This is the first of two linked presentations – the second one will be in February.

The focus of this presentation was on the kinds of learning Dylan Wiliam (DW) blieves that schools and colleges need. A second presentation in February will be on the practicalities about organizing teacher professional development.

NOTE >  Where should our efforts be focused?

  • Where does formative assessment fit in?
  • What makes effective teacher learning?
  • What doesn’t get done (because schools are so busy that the only way to do something new is to stop doing something that you’re already doing.)
  • And how do we know it’s working?

Evaluating Teaching

DW “There are still schools that are trying to improve by evaluating teaching. Yet the evidence does not support this.”

Do we know a good teacher when we see one ? [Almost certainly not]. 

US study of those known to be good or poor.

People were shows seven videos and asked to identify the good teacher from the poor teacher. The results were still below chance.

(JV COMMENT: Surely a very poorly conceived test that assumes that a video clip alsone could establish context. You’d might just as well look at their horoscopes).

They had some teachers who are definitely above average and some teachers who were definitely less effective than average. Yet they still got it wrong.

REF:  Learning about how human memory works as John Mason (Oxford and OU) 

NOTE > Learning is a change in long term capabilities. You cannot judge an educator’s effectiveness until you examine the long term effectiveness.

DW: ‘How much of what’s going on in this classroom today will these students remember in six weeks time?”

Robert Bjork, a psychologist at UCLA has established that the more students struggle in the learning task – i.e. the less satisfactory the learning, the more of it is remembered. 

[JV COMMENT> The character Jamal Malik recalled obscure facts in the film ‘Slumdog Millionaire because the severe events around these specific facts – just as my late grandfather could name everyone he served with during the First World War as well as the noises of every kind of ordinance being fired at him. The severity of the events etched themselves permanently on his brain]. 

NOTE and QUOTE > Dan Willingham (cognitive scientist) says, ‘memory is the residue of thought’.

DW QQ: Can we identify good teachers after training?

Ref: The Framework of Teaching by Charlotte Danielson (1996) and her colleagues at the educational testing service 

Danielson identified four domains of practice:

  1. >  if you are [only] taught by a teacher who is above average on planning and preparation or above average or professional responsibilities you do not make more progress.
  2. > If you’re taught by a teacher who is rated high on ‘instruction’ – classroom environment – you do better. Just not very much.
  3. > If you’re taught by a teacher who is outstanding – you will learn about 30% more (for reading and math).

DW NOTE: We know that good teachers make a difference, but what makes the difference in teachers?

NOTE: Observations are unreliable and they’re biased 

Heather Hill at Harvard worked out that before you could make a qualified judgment about a teacher’s ability you would need to make 50 independent observations of their lessons. (Hill, Charalambous and Kraft, 2012).

[JV COMMENT > The answer is to use Big Data – using analytics rather than observations]. 

Self-selecting online surveys suck:

  • bias
  • skewed
  • unrepresentative
  • miss those we really want to hear from

DW: There’s also a problem with bias, which comes to the work of Matthew P. Steinberg University of Pennsylvania and his colleagues – and they looked as part of the Gates funded measures of effective teaching projects. If you’re teaching a top set your chances of being rated as an outstanding teacher is 34%. If you’re teaching a bottom set your chance is 5% 

Ref: Carrell and West (2010) Does Professor Quality Matter? 

DW QUOTE & NOTE: ‘Teaching is a marathon not a sprint’. 

[JV COMMENT: Teaching has more to learn from sports coaching and teaching music than the other way around]. 

NOTE: Students do not know when they’re being well taught.

Thomas Kane and his colleagues combine different measures of teacher performance: 

  1. value-added estimates 
  2. observations 
  3. using things like the Danielson Framework for Teaching and Ron Ferguson’s Tripod Survey (asking students about the most effective teachers).  

To predict next year’s scores on standardized tests the best weighting of these three measures were:

  • 81% on value-added estimates 
  • 17% on class observations 
  • 2% on student perception surveys.

BUT when they looked at how good these teachers were teaching higher-order assessments the correlation is only .29 and the reliability is point 5, which is way too low to make any kind of high-stakes decision. 

NOTE: You need nine years of data.

Even by combining information from multiple sources there is still a massive amount of uncertainty about who the good teachers are and who the less able teachers are. 

What would happen if we removed the teachers who are low performing?

>if they have one bad year 

> if they’re bad on average over two years.

> fire 25% of your teachers at the end of the year. The net impact on student achievement is –  three days more learning in a year.

Marcus Winters and his colleagues looked at the effect in Florida. 

(JV COMMENT: The teacher is not the most important factor in the mix or the only one – what about behaviours learnt at home ?)

‘We can’t fire away to Finland’. [Linda Darling-Hammond].

[Ref: “We have the resources to solve the problems of inequity and low achievement,” she said, “we just aren’t using them well.” “You can’t fire your way to Finland; you can’t fire your way to excellence,” she noted, referring to one of the world’s leading countries in educational achievement. REF: Policy Meet > She also told a story about her husband rummaging around the house early one morning before a trip – he couldn’t find his rucksack, “what, the one on your back”, she said.  

[JV COMMENT: What about improving everything else too? Parental, cultural and societal expectations? Providing the means for anyone who wishes to progress to do so independently – after all, why else do we have municipal libraries, and close the divide caused by digital poverty by ensuring that students who want to learn have the means and the space to do so? Turn a few hundred comprehensives into boarding schools!].

If “flourishing fulfilled lives is the SSAT goal …”

DW NOTE: What we need to do is to improve every single teacher.

There’s no next big thing or ‘magic bullet’ in education but there are lots of small things.

There is no evidence that these work: 

  • Brain Gym
  • Preferred learning styles
  • Policy tourism
  • Educational Neuroscience 
  • Getting smarter people in to teach

These work … perhaps

  • Differentiation might work (but we don’t yet know).
  • Lesson study and learning study (imported from Japan) 
  • Grit and perseverance alone (they have to want to learn the topic though)
  • Fire bad teachers 

These probably work

  • Social and emotional needs
  • Smaller class sizes 
  • A growth mindset and ever more challenging work to do [which in turn requires differentiation]. 

Waits Time

Every teacher knows the research on waits time but they don’t do it and reminding teachers research doesn’t have any impact on their practice.

[JV TIP: Put up the questions and use a countdown clock. Then remain shtum.] .


A very powerful technique. It’s revolutionized research in the Health Sciences – but not for education. There are lots of these inappropriate comparisons made in the field. 

It’s not the teacher collective efficacy that improves student achievement, rather it’s student achievement that improves ‘teacher collective efficacy’.

The File Drawer Problem

DW: The ‘file drawer problem’ is the name given to the fact that it’s 12 times easier to get research studies published if the findings are significant – the rest ends up on the researcher’s ‘file drawer’.

If they only used pre-registered and reproduced interventions … and replicable so that you want just a fluke. What they found was that the average effect size in reality was about a third of that reported in the meta-analysis because these things were just not as stable as the file drawer – that’s that file drawer problem. 

There’s variation intervention quality. If you say does class size reduction works, what do you need to compare reductions of ten percent reductions of 50%. The difficulty is that doctrines of 50% are expensive. So the research that tends to get done is the cheapest research which therefore produces under estimates of effect.

When everybody’s using the same measure then you can prepare research studies. But if two different research studies use different measures of example maths achievement, then you don’t know that they’re equally sensitive to the intervention that you’re producing.

And when you have younger students – because they’re less diverse as the faster students have had  less time to pull away from the slower learners you get larger effect sizes. So effect sizes are influenced by the age of the children in the study.

So what I think it means is that some problems are unavoidable? Some problems are avoidable. So you could avoid inappropriate comparisons. You can control for the file drawer problems by testing to see whether the biggest results are in the smallest experiments.

The problem is that meta-analysis in education often fails to do these things even published in reputable journals.

DW NOTE: meta-analysis is hard to do well anywhere in education.

In education we need to ask four questions: 

  1. Does it solve a problem you have ? 

[JV NOTE: Creative Brief QQ1 ‘What is the problem?’]

The impact of teachers’ subject knowledge on student achievement is irrelevant if all your teachers have very good subject knowledge. 

2) How much extra achievement will it yield?

I want to know how extra months of learning what we get because if you can’t answer that question, then you have no idea about how effective your intervention is. It’s the only direct measure any Improvement in education manifests itself in an improvement in the rate at which students are learning or a reduction in the time students take to learn something.

So in other words any improved education changes the rate of learning – increases it. So let’s find out it may be hard to estimate when we have very big confidence intervals. But I think that’s the only thing that we should be focusing on

3) How much will it cost ?

4) Can we implement it here?

That’s why School leaders need to be critical consumers of the research

There’s no magic formula here, but I do think the research suggests that there are two things that are worthy of your serious consideration.

Two best bets:  

  1. A knowledge rich curriculum
  2. Greater use of classroom formative assessment.

Knowledge rich means – how things fit together.

So in science, for example, what we aim to do is equip our students with a large number of powerful laws through reason exactly typically and so I don’t mean knowledge of facts. I mean the students have ways of thinking about things they have a good set of chronology in history.

They have a wide range of problems or repertoires of mathematics.

The idea is that we need to focus our knowledge.

What you can do is equip young people with more things to think with more powerful ways of reasoning about the world that are exemplified by the different subject disciplines.

And that’s what we mean by a ‘Knowledge Rich Curriculum’. Here’s a problem.

We don’t know how to do it yet.

As we now know both from theoretical work from empirical work on feedback and most recently from randomized control trials – giving schools materials for doing formative assessment better in our classrooms. We now know that students learn more 

DW NOTE > Formative assessment is a multi-layered process.

Short (in the class), medium (6 to 10 weeks) and long term cycles (at the end) – all are important. You cannot just choose some of them. 

[JV COMMENT : with data students are assessed all of the time]. 

You need some kind of common assessment measure to  see what the students are making progress and if some of them aren’t you do something about it? This goes back to Benjamin Bloom’s work on ‘Mastery Learning. 

DW NOTE > If you’re getting a bell curve – a normal curve of results, you’re not doing your job. This is because the normal curve is what nature gives us. 

9 Real Life Examples of Normal Distribution 

Students differ in their aptitude for learning.

Our job as teachers is to destroy the bell curve –  and that’s where  standard assessments – common assessments across a whole year group force us to confront the fact that these students did not learn what they were taught and you do something about it.

Except the bell curve you do something about it.

So we do all these things.

The long cycle and the medium cycle stuff largely involves working on classroom procedures. The short cycle stuff involves changing what our teachers do – day in day out in classrooms.

NOTE > It is much easier to change what teachers do when students are not present than it is to change what teachers do when students are present. And so although all these aspects of the words of assessment are important.

NOTE > They require different kinds of professional development to do effectively.

You’ve probably seen this framework. 

That’s my colleagues and I have developed Sighing shown on the last living sending intentions,  listing at this e-learning, giving feedback, students as resources for one another – these I think effectively define the terrain of formative assessment (you can find out more about this on YouTube).

The big idea here is that each of these strategies has a research base showing each of these things include student learning together.

They form maximallys powerful subset of all the things you might do.

Further support for this was provided by the Educational Foundation that listed the three most cost-effective interventions are:

  1. closing achievement gaps
  2. chose feedback.
  3. peer tutoring.

You can’t give feedback until you find out what was going wrong.

You need to elicit evidence of learning and you don’t know what evidence to visit. Until you’re clear about what it is. You want your students to learn.

Some people prefer the term responsive teaching.

Students don’t necessarily learn what we teach them.

Formative assessment is just good teaching.

Beware poor evidence? Bland open questions such as ‘are you OK with that?’ or the traffic light system are not enough.

REF: Dunning-Krüger

How we can design for scale but many researchers working with a handful of teachers and my view is that that’s probably not that helpful.

We need to be thinking about ways of improving 300,000 classrooms in England, with approval including two million classrooms in the United States.

And so I think the model we’ve been developing for formative assessment is a single model for the whole school.

And so we’ve been thinking about scale in terms of Cynthia Cogan’s work:


  • How’d you get it embedded in the work of the school?
  • How do you make it sustainable?
  • How do you  make sure that it spreads and ultimately this is part of sustainability really needs to be a shift from reform ownership.

Education reform is the diversity of context of application. 

There are far too many differences  that need to be built on so that’s why I think it’s so important that teachers are involved in the design of this and making work in their own context.

Most teachers get better for much longer than people think. If you are in a school where there’s a good strong professional learning environment, you will get that good after just four years. But if you’re unlucky enough to be in a school where the professional environment is at the bottom 25% you’ll probably never get that good.

DW NOTE > So there’s now quite convincing evidence that teacher collaboration has a significant impact on teacher quality. And so we can  begin to design programs that harness the power of collective action. 

DW NOTE > We have conceptualized teacher expertise as being mostly a matter of knowledge. And I think it’s not, I think it’s a matter of practice.

“What works?” is not the right question education because everything looks somewhere nothing looks everywhere teaching is mainly a matter of phronesis.

The work of Nona Takeuchi.

A large amount of the knowledge that people have is tacit.

And people can get explicit knowledge by being told it explicitly but the important point is that there are two other processes where tacit knowledge becomes explicit because we’re forced to talk about it. And we also get advice from others that we internalize the figure out what to do for ourselves 

What this suggests is that in organizations where tacit knowledge is an important aspect of expertise – then you need to develop a knowledge creating spiral that involves dialogue networking learning by doing and sharing experience.

This has driven our approach to education improvements this idea of combining dialogue networking learning by doing and sharing experience because it is a very clear the kind of expertise that expert teachers have is not really easily reducible to words the expertise that Just have is more like the knowledge of riding a bicycle than it is being able to solve an equation.

Formative Assessment for Teaching.

So the first thing is to for the school to clarify a vision of what good teaching looks like. Then to systematically illicit evidence of where teachers are with respect to: 

  • teachers providing feedback.
  • teachers as sources for one another 
  • teachers as owners for their own learning.

They are the psychological job and becoming a self-regulating learner.

Teachers need to become better analyzers of their own practice so they can continue to improve their practice whether somebody observes them or not

The problem right now in education when we  look at a new innovation 

DW NOTE > The messy truth here is the really important essence of effective leadership is stopping teachers doing good things to give them time to do even better things.

I think the teachers often are spending too much time marking and they say you see what I’m doing is no good. He’s good, but just think that the hour you spent marking the set of books might have produced even more learning. You just made it in a different way.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s