The selected datasets are from the Stanford Education Data Archive (SEDA). SEDA exists under the Educational Opportunity Project at Stanford University as a measure intended to facilitate improved educational opportunities for children. The construction of this archive has been funded through grants from the Institute of Education Sciences, the Bill and Melinda Gates Foundation, the William T. Grant Foundation, the Spencer Foundation, the Overdeck Family Foundation, a visiting scholar fellowship from the Russell Sage Foundation, the Carnegie Corporation of York, Bloomberg Philanthropies, and Kenneth C. Griffin.
SEDA 5.0 contains files on technical documentation and codebooks, test score estimates, covariate data, and ancillary data for 2008-19. It records academic achievement (for all students and by subgroup), achievement gaps, as well as demographic and socioeconomic data. The subgroups examined are race/ethnicity (Asian, Black, Hispanic, Native American, White), gender, and socioeconomic disadvantage.
Test score files contain test score data for six zone units: schools, geographic/administrative school districts, counties, commuting zones, metropolitan statistical areas, and states. Data on average academic achievement is measured using standardized test scores administered during the 2008-09 through 2018-19 school years to grades 3 through 8 in the subjects of mathematics and reading language arts.
Files on school-level test score data are divided using the cohort standardized and grade cohort standardized scales, and are pooled over subjects, grades, and years. The variables observed correlate to the averages of test scores in the middle grade, learning rates across grades, score trends across cohorts, and the difference between math and language scores. The school files do not hold estimates for subgroups, only for all students.
There are additional test score files corresponding to the six zone units by the two scales and by three pooling levels. All files have estimates for all students and by demographic. Estimates for each grade and year separately are held in the “long” files, which include grade-year-subject means and standard errors. The estimates within “pooled by subject” files are averaged across grades and years within subjects. These hold the average test score mean, the average learning rate across grades, as well as the average test score trends across cohorts for the two subjects. “Pooled overall” files contain estimates averaged across subject, grade, and year. They have the averages for test score mean, learning rate across grades, test score trends across cohorts, and differences in scores between subjects. Standard error is recorded alongside each measurement.
The covariate data provides information on the demographic, socioeconomic, and segregation elements of the six zones. School files contain a record for each school that is the average across years and the other containing a record for each school in each year. Within the data files for the six zone units the same variables are present in different formations – averaged across grades, averaged across grades and years, as well as separated for grade and year.
SEDA 5.0 test score data is sourced from the EDFacts data system of the U.S. Department of Education. The aggregated data is generated from state standardized test programs required from students in grades 3 through 8, assessing performance in math and reading language arts. The covariate data is primarily from three sources. Data from the Common Core of Data, generated from an annual survey of all U.S. public elementary schools, secondary schools, and school districts, includes descriptive information about schools and districts, including demographic subgroups. The data pulled from the American Community Survey concerns school demographic, school expenditure, teacher experience, and high school course enrollment. From the American Community Survey, the National Historical Geographic Information System provided socioeconomic and demographic information of individuals and residences in each unit except school level. The National Center for Education Statistics also provided some data.
SEDA2023 was done in partnership with Harvard’s Center for Education Policy Research. Composed of similar frameworks for technical documentation, test score and covariate data files, it compares average school district achievement prior and during the Covid-19 pandemic, analyzing years 2019-23 on the district and state level. The construction of SEDA2023 was done using data from EDFacts for 2019 and state accountability data for following years. Assessment data was gathered from individual states and from The National Center for Education Statistics and the National Assessment Governing Board to rescale state proficiency data. Some exclusions in the state-reported data exist due to variations in state suppression rules, content, inclusion/exclusion of subgroups, and not all states reporting usable data.
SEDA can be used to illuminate achievement gaps between demographics of gender binary, race/ethnicity, and socioeconomic status and how school district zoning determines access to free/reduced price lunch. It can also reveal which subgroups were most impacted by the pandemic as access to these opportunities was cut-off. Pandemic-related loss can be made visible through changes in academic achievement between 2019-22. Test score changes from 2022-23 can also show the extent of academic recovery as districts returned from distance to in-person learning and may illuminate effects of long-term Covid-19 symptoms on information retention and learning.
SEDA cannot reveal information on individual students, as no individual student-level data was collected from EDFacts. Data from 2020-21 is not comparable to other years due to limited testing. Additionally, SEDA 2023 does not provide information regarding Asian or Native American students. Immunocompromised/disabled subgroups or groups existing outside of gender binaries are also not included in SEDA. SEDA’s interpretation of “educational opportunity” is limited to free/reduced school lunch. There is also no data provided on student assessments beyond standardized test scores in math and language, such as extracurriculars or report card grades.
Using only standardized test scores for the test score data files promotes this form of testing as the ideal means to assess student academic performance. As SEDA receives grants from the federal government, the use of this federal testing program may reflect a bias. The narrow curriculum could also affect the data as it does not account for other subjects or non-assessment-based means of academic achievement. Additionally, these tests vary state-to-state in their designs and proficiency thresholds, reflecting state values. The archive’s limited scope of “educational opportunity” may also influence how the data is interpreted.