Why is SFUSD failing to meet its 3rd grade literacy goal?
It was always unattainable but why has there been no post-pandemic recovery?
A couple of years ago, SFUSD set itself the goal of attaining a 70% proficiency rate among 3rd graders on the ELA portion of the SBAC test by 2027. The district has been dutifully monitoring its progress towards this goal at regular intervals. The proficiency rate stood at 52% when the district set the 70% goal and now stands at 49%. I’m not going to criticize the target for failing to meet the target. I called the target “impossible” when it was first adopted so the regular monitoring meetings have always struck me as being like slow motion car crashes. Everyone knows it’s going to end badly. The only question is how badly.
Today, we’re going to look at how other SFUSD’s performance compares with other districts and we’ll drill into SFUSD’s numbers a bit to try to figure out what’s going on.
Proficiency Rates and Average Scores
It is conventional to use the proficiency rate (or the “percentage who meet or exceed the standard” to give it its formal name) as a measure of educational attainment in a district. Proficiency rates can be calculated for an entire school or district by comparing each student’s score with the grade-appropriate proficiency threshold. They are also easy to interpret.
When we’re dealing with just one grade, as we are here, there is another metric we can use: the average SBAC score of the students in that grade. The average score can be harder to interpret (is 2400 a good score or a bad score?) but has the intuitive advantage of valuing the performance of every student in the grade. In contrast, the proficiency rate invites a focus just on those close to the proficiency threshold. The progress of those well below and above the proficiency threshold becomes irrelevant because there is no chance of them crossing that threshold in either direction.
The relationship between the proficiency threshold and the average score is extraordinarily tight. The chart below shows that they move virtually in lock-step, as the R-squared of 0.993 proves. This means it’s not really possible to change the proficiency rate without changing the average score and vice versa.
Recall that SFUSD’s goal is 70% proficiency in 3rd grade. The chart shows that 70% proficiency is attained by districts with an average score of about 2475. SFUSD’s 3rd graders had an average score in 2024 of 2422.7 so they’re over 50 points short.
It’s not shown on the chart but the average score of SFUSD’s 4th graders in 2024 was 2469.4, which is still below the 3rd grade target. To reach its goal, its 3rd graders have to be better than its 4th graders are today. In other words, the district has to cram more than an extra year of learning into the K-3 period. That is one reason why I called the goal impossible in the first place.
The rest of this post will refer to average scores rather than proficiency rates because we’ll be doing some calculations that will make sense if done with average scores but which might raise your hackles if done with proficiency rates. It’s easy to convert between the two. The trend line that best fits all the points in the chart above has a slope of 0.413. This means that if the average score goes up by 10 points, the proficiency rate will go up by about 4.13 percentage points. As a rough rule-of-thumb therefore, a one percentage point increase in the proficiency rate goes along with a 2.5 point increase in the average score.
Pandemic Learning Loss and Recovery
The pandemic did not affect every district equally. 3rd graders in most districts are not doing as well as they were before the pandemic but the students in districts that were already low-achieving in 2019 did worse than students in districts that were already high-achieving in 2019. There are plenty of exceptions to that general statement. Some districts, led by Palo Alto, did better in 2024 than in 2019. On the other hand, Fremont is one of the highest-achieving districts in the state but its score has gone down by 28 points. Meanwhile, San Francisco’s score has gone down by 10.8 points which is not as bad as the trend line would imply.
We can examine the effect of the pandemic, and the recovery from it, more closely by comparing what happened during the pandemic (i.e. the difference between the 2019 and 2022 scores) and what has happened since (i.e. the difference between the 2022 and 2024 scores).
We can see that there has been a bit of a bounce back. The trend line slopes down, indicating that districts that lost more during the pandemic have gained more since. Notice that there is precisely one district (viz. Delano Union Elementary in Kern County) that has gained more than 20 points in 2 years, which is the pace SFUSD would have had to maintain over 5 years to reach its goal. And Delano was just recovering ground it lost during the pandemic.
San Francisco’s score has actually gone down by more since the pandemic than it did during it. It just just 2 points between 2019 and 2022 but has lost 8.8 points since. That seems like it shouldn’t happen (and later on, I’ll provide an explanation for why it did happen) but the district is hardly alone. There are lots of districts in that bottom-left quadrant. The districts closest to San Francisco on the chart include Cupertino, Irvine, Pleasanton, and San Ramon Valley, which are not usually counted among its peers on account of their much less disadvantaged student populations.
The Evolution of SFUSD’s Student Population
Enough about other districts. Let’s focus on what has happened in San Francisco
As a rough rule of thumb, SFUSD’s student population is one-third Asian, one-third Latino, one-sixth White, and one-sixth everybody else. Just over half the students are socioeconomically disadvantaged (a term that includes those who are eligible for free or reduced-price meals, those who are homeless or in foster care, and those whose parents did not graduate high school). The CDE reports average scores for each combination of ethnic group and socioeconomic status, as shown in the chart below.
Advantaged1 White students scored 6 points higher than advantaged Asian students while disadvantaged White students scored 22 points lower than disadvantaged Asian students. Nevertheless, on average, White students did 9 points better than Asian students because a large majority of White students are advantaged and a small majority of Asian students are disadvantaged.
Although the scores for disadvantaged Latino and Black students are almost identical, and the average scores for advantaged Latino and Black students are also almost identical, the average score for all Latino 3rd graders (2,358.3) is 5.5 points higher than the average score for all Black 3rd graders (2,352.8). The reason is that a higher percentage of Latino students are advantaged.
San Francisco’s 3rd grade score of 2422.7 is a weighted average of the scores of the each of the subgroups, with the weights determined by the number of students in the subgroup. It follows that, mathematically, there are two ways for a district’s score to increase:
the score of one of more of the subgroups increases e.g. the average score of disadvantaged Latino students goes up;
the share of the students who come from higher-scoring demographic groups increases e.g. a higher proportion of Latino students are advantaged and a smaller proportion are disadvantaged, or a higher proportion of students are Asian and a lower proportion are Latino.
Note that it doesn’t matter for our purpose why some groups score higher than others. It just matters that they do. A district should get neither credit nor blame for something beyond its control, such as the demographics of the students who walk through the door. It should be evaluated based on how it does with the students it has.
In the years since 2019, SFUSD’s 3rd grade demographics changed in ways that would tend to affect its score. For example, the fraction of SFUSD’s 3rd graders who are disadvantaged fell from 53.2% in 2019 to 50.7% in 2022 before jumping up to 56.8% in 2024. The chart below shows in more detail how the composition of SFUSD’s 3rd grade has changed.
Between 2019 and 2022, the pandemic led to a decrease in the fraction of disadvantaged Latino students but an offsetting increase in the fraction of advantaged Latino students. From 2022 to 2024, there was a significant increase in the fraction of disadvantaged Latino students but the fraction of advantaged Latino students was flat. Among Asian students, the changes were smaller but the pattern was similar i.e. a shift to advantaged students during the pandemic that reversed afterwards.
Even if the scores each subgroup didn’t change at all over this period, SFUSD’s overall score would change because of the changing composition of its students. It would have gone up from 2019 to 2022 because there were fewer disadvantaged students and it would have gone down between 2022 and 2024 because there were more disadvantaged students.
How The Pandemic Affected Scores
Of course, scores did change during the pandemic, and not evenly. Disadvantaged Black and Latino students were hit the hardest, which is ironic given the heated debate that we all remember about whether reopening schools should be a priority. Unfortunately, there hasn’t been much recovery since the pandemic. Scores for Latino students fell even further while those for Black students are still down over 20 points since 2019. Advantaged Latino students, whose scores declined by 4.4 points between 2019 and 2022, declined by a massive 23.6 points in the two years from 2022 to 2024. The kids who were in 3rd grade in 2023-24 were in kindergarten in 2020-21. Zoom school was bad for older kids but it must have been pointless for kindergartners. They effectively missed an entire year of schooling. That said, I would have expected that the effect of a missed school year would have attenuated over the intervening three years. I’m surprised there hasn’t been more of a bounce back.
It’s not all bad news. Advantaged Black students are doing better than they were in 2019. Asian and Filipino2 students, no matter their socioeconomic level, also did better in 2024 than in 2019. I have a theory why. English Learners score poorly on the ELA portion of the SBAC. If they didn’t, they wouldn’t be English Learners. The fewer English Learners there are in a group, the higher that group’s average score will be. The percentage of Asian 3rd graders who were English Learners fell from 56% in 2019 to 46% in 2022 and to 39% in 20243. That’s a big drop in just a few years. The percentage of Filipino 3rd graders who were English Learners fell from 29% in 2019 to 17% in 2022 to 15% in 20244. It is possible that the reason Asian and Filipino scores have risen is that there are fewer low-scoring English learners in those communities in 2024 than in 20195. To know for sure, we’d have to break out SBAC scores by race and language fluency. Neither CDE nor SFUSD publishes this data.
It would be neat if the decline we observed in the scores of Latino students could be explained by something as simple as an increase in the fraction of English learners, but the evidence is not there. Let’s assume that all Spanish-speaking English learners are Latino. If we just focus on advantaged Latino 3rd graders, we find that the share who are English learners rose from 42% in 2022 to 50% in 2024 and this coincides nicely with the 23.6 points drop in their SBAC score over the same period. Alas, this explanation breaks down when we look at the disadvantaged students. The percentage of disadvantaged Latino students who are English learners actually fell from 88% to 75% which should have led to an increase in scores but didn’t.
It would be very interesting to drill in to the Latino numbers further. Did Latino students who are fluent in English exhibit the same score drop as Latino English learners? Did the score drop vary according to the type of educational program (general education, immersion, or bilingual)? Only SFUSD has this data.
Performance Attribution
We’ve seen that SFUSD’s 3rd grade class has changed: it has more disadvantaged students, particularly more disadvantaged Latino students than it used to. We’ve also seen that the subgroup scores have changed: the scores for Latino students went down while those for Asian and Filipino students went up. We can hold SFUSD responsible for the changing scores of its students but not for the changing composition of its student body. I calculated the effect of each type of change (see the appendix for a description of the calculation).
If the student scores had not changed, the way the student body changed during the pandemic (i.e. there were fewer disadvantaged students) would have added 1.5 points to SFUSD’s 2019 score by 2022. That scores actually declined by 2.0 points over this period means that student performance really declined by 3.5 points.
Demographic change has worked the other way since the pandemic. Even if all the demographic subgroups had continued to score the same, SFUSD’s overall score would have dropped by 5.4 points because the number of disadvantaged students jumped. The observed drop over the 2022-24 period was 8.8 points, meaning that student performance really declined by 3.4 points. 3.4 is not as bad as 8.8 but it’s still a drop. I don’t have any theory why performance would be lower three years after the return to school than it was one year after the return to school.
Final Thoughts
It’s good that the board is focused on measurable student learning objectives but the 70% target is going to be a millstone until they abandon it.
This year’s 3rd graders didn’t miss a year of school. Their kindergarten year was spent in class but masked up. If masking had no effect on learning, then we should see a bounce back in 3rd grade scores in 2025. If masking did affect learning, any bounce back might be delayed to 2026. That’s not much solace to the disadvantaged Latino and Black students who are now in 4th grade and who are well behind where their peers were before the pandemic.
It’s easy to speculate about why Latino students and disadvantaged Black students were the ones hardest hit by the pandemic. I assume many of the Latino students don’t hear English at home so missing a year of school represents a proportionately greater loss of exposure to English than it does for students who hear English at home or in the playground.
In response to the low scores, SFUSD has decided to change the language mix in its Spanish programs. Historically, these started off at 80% Spanish and only 20% English with more English being added over time. Now, they intend to start off at 50-50. I can understand the urge to change something, because what they’re currently doing clearly isn’t working, but it’s not obvious to me that the same solution is right for both biliteracy and dual-language immersion classes. Biliteracy classes require incoming fluency in Spanish which means the students are almost entirely Latino. Many of the kids in those classes won’t hear English at home and won’t hear it from their classmates so hearing more of it from their teachers makes sense. Dual-language immersion feels different. By design, the class is a mixture of English-speaking and Spanish-speaking students with, ideally, some who are fluent in both and can act as the glue between the two halves of the class. The Spanish-speaking kids in a dual-language immersion class will hear English from their classmates so the need to hear more of it from their teachers shouldn’t be as pressing.
It will take time to tell if the switch to a 50:50 model is successful. The first class to start at 50-50 in kindergarten won’t be taking the SBAC until 2029. Since some bounce back to pre-pandemic SBAC levels can be expected, success will require exceeding those pre-pandemic levels. And, of course, the district is implementing a new literacy curriculum so the only way to determine success will be to compare the trajectory of students in language programs with those in general education programs.
It would also be interesting to see if the switch to a 50:50 model has any effect on how well the English-fluent students learn Spanish but I’m not aware if this is even measured. It’s certainly not reported in any systematic way. In general, I wish SFUSD would report results by type of language class and by English-fluency level. A correspondent kindly sent me an academic paper that addresses this precise topic but the research on which it is based is more than ten years old by now. Annual data reporting would be good. I’d get to test out one of my pet hypotheses which is that English-only kids in dual-language immersion classes do better than English-only kids in general education classes. The second part of my hypothesis is that this is not because of what they’re learning in the language class but because of a selection effect: only parents who are confident of their children’s success in school sign them up for the extra challenge of an immersion class.
Appendix
Here’s a numerical example to illustrate the procedure for estimating how much of the overall SBAC score change is due to changing demographics.
SFUSD’s 3rd graders scored 2431.5 in 2022 and 2422.7 in 2024, a decline of 8.8 points. How much of the change was due to the changing number of disadvantaged students?
In 2022, 50.7% of all students were disadvantaged and they averaged 2395.9. The non-disadvantaged students averaged 2468.1. A weighted average gives the overall score i.e.:
2022: (50.7% * 2395.9) + (49.3% * 2468.1) = 2,431.5
In 2024, disadvantaged students scored 2394.4, not far off their 2022 score, but they now comprised 56.8% of the student body. Non-disadvantaged students scored 2459.7, a drop of 8.4 since 2022. Again, a weighted average gives the overall score for that year:
2024: (56.8% * 2394.4) + (43.2% * 2459.7) = 2,422.7 (it’s off by 0.1 due to rounding)
Now suppose that the scores of disadvantaged and non-disadvantaged students did not change at all between 2022 and 2024 but that the share of disadvantaged students increased as it did in reality. The overall score would then be:
2022 scores with 2024 students: (56.8% * 2395.9) + (43.2% * 2468.1) =2,427.1
We can then conclude that the student body changes were responsible for a 4.4 point decline (2431.5 - 2427.1 = 4.4). The changing scores of the disadvantaged and non-disadvantaged students must have been responsible for the remaining 4.4 points (i.e. 2427.1 - 2422.7).
This numerical illustration used just two subgroups (disadvantaged and non-disadvantaged students). My calculation used a dozen (disadvantaged and non-disadvantaged Latinos; disadvantaged and non-disadvantaged Asians etc.) so the result shown above (i.e. the subgroup sizes and subgroup scores both being responsible for 4.4 points of decline) is different from what was shown in the chart in the main post. The principal is exactly the same, however.
The formal opposite of disadvantaged is non-disadvantaged, not advantaged. Avoiding a double negative aids clarity so I’m going to use advantaged instead.
Only in California are Filipino students not considered Asian.
This calculation assumes that all English Learners whose first language is Cantonese, Mandarin, or Vietnamese are Asian (and not, say, of Two or More Races). The numbers would be higher if I included other Asian languages but (a) I was lazy, and (b) there are not enough of them to change the trend.
Again, this calculation assumes that all English Learners whose first language is a Philippine language are Filipino.
It’s also conceivable that the proportion of English Learners in kindergarten was unchanged but that SFUSD reclassified more of them as fluent before 3rd grade.
Nice article. As a parent, I would also be curious to test out the Immersion class performance vs General Ed performance self-selection hypotheses you mentioned. Perhaps one day that data will become available...
I have no doubt that there is a selection bias in the immersion programs. I imagine one can look to the lottery prioritization to glean some of this.
I am concerned with "It would also be interesting to see if the switch to a 50:50 model has any effect on how well the English-fluent students learn Spanish but I’m not aware if this is even measured." It seems like English-fluent students are not even a consideration in the decision making.