Berkeley's 1973 Graduate Admissions Dataset
The "Berkeley Dataset" contains all 12,763 applicants to UC-Berkeley's graduate programs in Fall 1973. This dataset was published by UC-Berkeley researchers in an analysis to understand the possible gender bias in admissions and has now become a classic example of Simpson's Paradox.
- Dataset Format: Well-formatted CSV with column headers as the first row
- Dataset Size: 12,763 rows × 4 columns
- CSV File Location: https://waf.cs.illinois.edu/discovery/berkeley.csv
- Dataset Variables:
Year
: number ➜ The application year (this data is always1973
)Major
: string ➜: An anonymized major code (eitherA
,B
,C
,D
,E
,F
, orOther
). The specific majors are unknown except thatA
-F
are the six majors with the most applicants in Fall 1973Gender
: string ➜ Applicant self-reported gender (eitherM
orF
)Admission
: string ➜ Admission decision (eitherRejected
orAccepted
)
- Research Paper: Sex Bias in Graduate Admissions: Data from Berkeley by P. J. Bickel, E. A. Hammel, and J. W. O'Connell (1975)
Using the Berkeley Dataset in Python
The dataset can be loaded using the pandas
library in Python:
Year | Major | Gender | Admission | |
---|---|---|---|---|
0 | 1973 | C | F | Rejected |
1 | 1973 | B | M | Accepted |
2 | 1973 | Other | F | Accepted |
3 | 1973 | Other | M | Accepted |
4 | 1973 | Other | M | Rejected |
... | ... | ... | ... | ... |
12758 | 1973 | Other | M | Accepted |
12759 | 1973 | D | M | Accepted |
12760 | 1973 | Other | F | Rejected |
12761 | 1973 | Other | M | Rejected |
12762 | 1973 | Other | M | Accepted |
Pages Using the Berkeley Dataset
- Learn Page: Simpson's Paradox
No hay comentarios:
Publicar un comentario