The Empirical Economist: Regressions without Neuroses: Jonas Vlachos

VARNING: GRAFERNA I DENNA BLOGGPOST VISAR KORRELATIONER. DE ANVÄNDS ENDAST FÖR ATT DISKUTERA SAMBAND SOM I DEBATTEN HÄVDAS EXISTERA, INTE FÖR ATT PÅVISA ORSAKSSAMBAND.

I en twitterkonversation med Jonas Vlachos igår diskuterade jag hur skollagens idéer om likvärdighet – att vilken skola man går på inte ska spela roll för ens resultat – möjligtvis endast kan tillgodoses av att genomsnittsresultaten faller. Jonas menade att ”[h]ög genomsnittskvalitet o små skillnader mellan skolor förefaller dock gå hand i hand även om mekanismen är oklar. Så empirin talar emot att likvärdigheten kräver kvalitetssänkning.”

Detta är något som ofta upprepas i debatten: en jämlik skola är en bra skola. På DN Debatt för ett par veckor sedan påstod Stefan Löfven och Ibrahim Baylan exempelvis att ”OECD har tydligt visat att de skolsystem i världen som presterar bäst också är de mest jämlika.”

Men stämmer detta? Jag ignorerar helt och hållet problemet med vad som orsakar vad utan inriktar mig endast på huruvida argumentet ”en jämlik skola samvarierar med en bra skola” stämmer.

Vi tar en titt på de senaste resultaten från PISA 2012, som i debatten verkar anses vara det viktigaste måttet för om ett skolsystem är bra (även om jag inte håller med om det). I den första grafen finner vi en positiv korrelation mellan relativt hög skolvarians och resultaten. Måttet är här variansen mellan skolor som en procent av total varians i landet (uttryckt i procent av genomsnittsvariansen i alla länder). Här ges inget stöd för påståendet att små skillnader mellan skolor och höga resultat går hand i hand.

I nedanstående graf håller jag även variansen inom skolor konstant men ändå finner vi ingen positiv relation mellan låg skolvarians och höga resultat. Tvärtom är högre skolvarians fortfarande positivt relaterat till genomsnittsresultaten.

Men vad händer om vi fokuserar på skillnader mellan elever, vilket jag själv anser vara ett mer rimligt mått på likvärdighet? Det ger inte heller något stöd för att en jämlik skola går hand i hand med en bra skola. Istället ser vi en positiv korrelation mellan hög ojämlikhet och höga prestationer. Den första grafen visar relationen mellan resultat och hur stora skillnaderna är mellan hög- och lågpresterande elever (skillnaden mellan elever i 95:e och 5:e percentilern). Ju större skillnader mellan elever, desto högre resultat. Samma analys återfinns för övrigt i PISA-rapporten men där definieras skillnaden mellan hög- och lågpresterande elever som skillnader mellan elever i 90:e och 10:e percentilerna. Resultaten är snarlika.

Nedanstående graf visar istället sambandet mellan hur stark effekten av elevernas bakgrund är och resultaten. Inte heller där finns det en positiv korrelation mellan likvärdighet och resultat. Faktum är att det inte finns någon relation alls mellan effekten av bakgrund och genomsnittsresultat (det är inte ett statistiskt signifikant samband).

Jag vill återigen starkt betona att ovanstående inte har någonting med orsakssamband att göra. Men jag ifrågasätter argumentet som vi ofta hör i den svenska debatten om att det finns ett samband mellan likvärdighet/jämlikhet och resultat.

UPPDATERING: Jag kollade även om variationskoefficienten (standardavvikelsen/medelvärdet) är lägre i länder med höga genomsnittliga resultat. Högre skillnader när man tar hänsyn till ländernas resultat är negativt korrelerade med genomsnittsresultaten (medan den absoluta standardavvikelsen istället är positivt korrelerad med resultaten). Jag menar dock att den relativa spridningen inte är det mest relevanta måttet på jämlikhet. Istället bör vi fokusera på den absoluta spridningen. Varför? Därför att om vi är intresserade av jämlikhet inom länderna är det inte säkert att vi är bör ta hänsyn till medelvärdet eftersom som vi då premierar länder som har höga resultat - och då studerar vi en annan fråga än den om jämlikhet. (Se exempelvis Jonas Vlachos som diskuterar förändringar i likvärdigheten i TIMSS och PISA baserat skillnaderna mellan hög- och lågpresterande elever som jag gör ovan.)

Yesterday, SNS published a report reviewing the research on whether the introduction of competition in the Swedish public sector has generated large efficiency gains. It finds that there is little evidence of this. The chapter on free schools and competition, written by Jonas Vlachos, deserves a comment. In general, I agree entirely with Vlachos’s conclusion that decentralisation and competition require more accountability regarding grading practices in Swedish schools – which I have argued here.

In this post, however, I focus on Vlachos’s findings regarding the impact of for-profit/non-profit schools on educational achievement/grade inflation – which have been cited in Swedish media as evidence that for-profit schools underperform. I also present new evidence indicating that profit-seeking compulsory schools suffer from slightly higher grade inflation than other school types, although the quantitative estimates are small.

Vlachos shows that pupils in for-profit/non-profit compulsory schools perform better in terms of average grades than pupils in municipal schools, and also that the effect on scores on standardised tests is stronger than on the GPA. (There is no attempt to deal with potential endogeneity or two-way causality.)¹Although certainly far from methodologically flawless, partly due to data availability, my study from 2010 – using OLS and IV models – also finds that for-profit schools perform better in terms of average ninth grade GPAs.

However, arguing that average grades and test scores may not be comparable between schools with different ownership structures – given Sweden’s extremely decentralised grading practice – Vlachos also finds another, negative effect of for-profit schools. When employing standard control variables accounting for socio-economic background, age, sex, immigration etc, pupils attending for-profit compulsory schools obtain slightly lower GPAs in upper-secondary school – amounting to 0.085 of a standard deviation – than those who attended municipal compulsory schools (in relation to their ninth grade GPA). Vlachos suggests two possibilities for the results: (1) for-profit schools are worse than municipal schools or (2) for-profit schools may have higher grade inflation in ninth grade, presumably due to the profit motive.

Firstly, although not presented in the report but kindly supplied by Vlachos, the R²value for this regression is 0.03. In other words, despite the variables utilised, only 3% of the difference between pupils’ GPAs in ninth grade and upper-secondary school can be explained by the model. This strongly suggests that there are other, omitted – and much more important – reasons for why pupils perform better/worse in upper-secondary school in comparison to how they performed in ninth grade.

This leads to my second point. Vlachos’s suggestions regarding how one should interpret the regression ignore three years of upper-secondary education (and life changes!), which his model treats as a black box. For example, it could equally be the case that the differences in grades are generated by differences in upper-secondary schools (in terms of equally decentralised grading practices, quality of education, teacher quality, pupil motivation etc). There is also no control for whether the pupil attended a for-profit/non-profit upper-secondary school.

It should be pointed out that Vlachos certainly admits this to a certain extent: there could be systematic differences between schools with different ownership structures regarding the upper-secondary school programmes their pupils choose (i.e. natural or social sciences, or non-academic, such as the arts, media, and handicraft programmes etc).

But this has strong implications. Pupils’ GPAs are much less comparable in Swedish upper-secondary schools than in compulsory schools, due to vast differences between pupils’ education in the former (they choose programmes focusing on different subjects – such as social sciences, natural sciences, vocational subjects, performing arts etc). When analysing overall educational attainment at Swedish upper-secondary school, one compares pupils studying vastly different subjects. This is essentially tantamount to comparing apples with oranges.

Another issue is a strong potential for selection bias. For example, if pupils in for-profit compulsory schools systematically choose more difficult academic programmes, with more difficult courses, in upper-secondary school, this could have a negative impact on their GPAs. On the other hand, if they choose vocational programmes to a larger extent, the incentive to care about grades declines significantly since these are not as important for the pupils' future careers as for those who choose academic programmes. This, of course, is purely speculative. The point is that there are numerous possible reasons for why upper-secondary school pupils perform better/worse than they did in certain compulsory schools.

The overall conclusion, therefore, is that it is doubtful whether one can compare overall educational attainment at the upper-secondary school level – at least if one doesn’t take into consideration which programmes and courses pupils choose.

A better alternative (1) would be to compare the difference between pupils’ grades in a specific course – say Mathematics A after the first year of upper-secondary school – with their final maths grade in compulsory school. Alternatively (2), one could analyse differences between pupils’ GPAs in ninth grade and their GPAs after the first year in upper-secondary school, since the latter reflect achievement in mandatory courses for pupils in all programmes. Yet another alternative (3) is to also include the second and third year upper-secondary school GPA in the analysis, as Vlachos has done, but restrict the sample to pupils choosing academic tracks (see Böhlmark and Lindahl 2008).

Having showed that Vlachos’s results should be interpreted with caution, it is also important to compare scores on standardised tests and final grades in ninth grade. This is a measure to analyse the comparability of final grades between profit/non-profit and municipal schools. The advantages are that one doesn’t have to take into account confounding variables at the upper-secondary school level, which clearly are at play here, and also that one is analysing average achievement in subjects that are mandatory for all pupils. The disadvantage is that these tests are still marked by individual teachers so it is doubtful whether the scores are strictly comparable.

In a footnote on page 87 in the SNS report, Vlachos notes briefly that there are no differences between for-profit and municipal schools in this respect. However, in my analysis, presented below, I do find that for-profit schools in 2009 set slightly higher grades than municipal schools compared to how pupils perform in standardised tests in mathematics and English (but not in Swedish). Yet, the differences are quantitatively small: for-profit schools give higher final grades to 2.5-3.3% more pupils compared to municipal schools.²

Differences between test scores and final grades (2009)
(% of pupils with higher minus % of pupils with lower final grades compared to test scores)
	English	Swedish	Maths
For-profit free school	2.5**	1.3n.s.	3.3**
Non-profit free school	-0.9n.s.	-0.8n.s.	0.6n.s.
Pre-reform free school	4.3*	0.4	-0.05n.s.
Boys	-9.7**	-6.2n.s.	-2.9n.s.
Immigrant_1st_gen	10.6*	-1.8n.s.	4.8n.s.
Immigrant_2nd_gen	-11.9***	14.7**	1.4n.s.
Parents’ education	-12.8***	-0.8n.s.	-10.5***
Teacher density	-0.03n.s.	0.15**	-0.1*
Urban dummy	-0.07n.s.	1.1n.s.	0.18n.s.
Average income (municipality)	0.03**	0.02**	-0.0007n.s.
% in free schools (municipality)	0.02n.s.	-0.07n.s.	-0.007n.s.
N	1,275	1,266	1,269
Adj. R2	0.06	0.03	0.02
Note: p<0.1, p<0.05, **p<0.01, unstandardised coefficients, robust standard errors

Furthermore, similarly to Vlachos’s results, the low adjusted R²values suggest that only about 2-6% of the differences can be explained by variables normally associated with pupil achievement. Again, therefore, the vast bulk of the differences in grading practices between compulsory schools seemingly depends on other factors.³

Since my study at the IEA analyses ninth grade school GPAs, the findings should of course also be interpreted with caution. Yet, given the small quantitative estimates regarding differences between test scores and final grades – as well as the overall lack of explanatory power of both Vlachos’s and my regressions here – there is little hard evidence suggesting strong differences in grading practice between compulsory schools with different ownership structures on average. This could imply that ninth grade GPAs across different school types are relatively comparable. It is certainly difficult to argue that there is strong evidence that for-profit schools are worse than municipal ones based on Vlachos’s findings.

Regardless, grading practices in Swedish education must be radically reformed. According to the SIRIS-database, some schools give higher final grades to pupils compared to how they perform on standardised tests in over 100% of the cases (!) – while for some schools, the figure is 0%. This strongly suggests that grades are currently not comparable across individual schools, which has to be remedied.

One alternative is to centralise all formal grading in both compulsory and upper-secondary schools; another is to allow upper-secondary schools and universities to accept more pupils by other means than the GPA, as I’ve argued here. Competition should always be accompanied by strict accountability, but Sweden’s decentralised grading system, in combination with its extremely centralised admissions system, does not allow for such accountability.

----

¹However, the effects of attending for-profit compulsory schools do not translate into higher GPAs in upper-secondary school (the effect is not statistically significant), and the non-profit effect is small. Furthermore, there is no evidence that pupils in free schools overall performed better in the latest PISA test.

²It should also be noted that these differences may not exist – or be more significant – in other years, which is something I now am researching. I will also analyse data from 2010 for which grades in biology, physics and chemistry are available.

³However, it is interesting that the higher the level of parental education, the lower is grade inflation in mathematics and English (while there is no effect in Swedish). This suggests that the fear that grade inflation would disadvantage pupils from lower socio-economic backgrounds –since they purportedly do not tend to pressure teachers to get higher grades to the same extent as pupils from higher socio-economic backgrounds – is unfounded.

The Empirical Economist: Regressions without Neuroses

Friday, 20 December 2013

Är en jämlik skola en bra skola?

Friday, 9 September 2011

Grade inflation, educational achievement and the profit motive: a comment on the SNS report