What is Simpon's paradox (Python version)?
Simpons paradox
Simpson's paradox is a phenomenon in probability and statistics, in which a trend appears in several
different groups of data but disappears or reverses when these groups are combined.
Example: UC Berkeley gender bias
One of the best-known examples of Simpson's paradox is a study of gender bias among graduate
school admissions to University of California, Berkeley. The admission figures for the fall of 1973
showed that men applying were more likely than women to be admitted, and the difference was so large
that it was unlikely to be due to chance.
import pandas as pd
file = 'https://raw.githubusercontent.com/xie186/Coursera_StatisticsWithPython/
master/data/UCBGradAdmData1973.csv'
df = pd.DataFrame(pd.read_csv(file))
df
Gender bias
The admission figures for the fall of 1973 showed that men applying were more likely than women to be
admitted, and the difference was so large that it was unlikely to be due to chance.
import numpy as np
df_sum = pd.pivot_table(df, values="Freq", index=["Gender"],
columns=["Admit"], aggfunc=np.sum)
df_sum["%"] = 100*df_sum["Admitted"] / (df_sum["Admitted"] + df_sum["Rejected"])
df_sum
df_sum["%"].plot(kind="bar", y="%")
Out[4]:
Out[4]:
Department level summary
When examining the individual departments, it appeared that six out of 85 departments were
significantly biased against men, whereas only four were significantly biased against women.
In fact, the pooled and corrected data showed a "small but statistically significant bias in
favor of women".[15] The data from the six largest departments are listed below, the top two
departments by number of applicants for each gender italicised.
df_pivot = df.pivot_table(values="Freq", index=["Dept"],
columns=["Gender", "Admit"])
df_pivot["%(Female)"] = 100*df_pivot["Female"]["Admitted"]/(df_pivot["Female"]["Rejected"] + df_pivot["Female"]["Admitted"])
df_pivot["%(Male)"] = 100*df_pivot["Male"]["Admitted"]/(df_pivot["Male"]["Rejected"] + df_pivot["Male"]["Admitted"])
df_pivot
df_perc = df_pivot[["%(Female)", "%(Male)"]]
print(df_perc.columns)
df_perc.columns = ['%(Female)', '%(Male)']
print(df_perc.columns)
df_perc
df_perc.plot(kind="bar")
The research paper by Bickel et al. concluded that women tended to apply to competitive
departments with low rates of admission even among qualified applicants (such as in the
English Department), whereas men tended to apply to less-competitive departments with
high rates of admission among the qualified applicants (such as in engineering and chemistry).
The research paper by Bickel et al. concluded that women tended to apply to competitive
departments with low rates of admission even among qualified applicants (such as in the
English Department), whereas men tended to apply to less-competitive departments with
high rates of admission among the qualified applicants (such as in engineering and chemistry).
Comments
Post a Comment