\documentclass[12pt]{article} \addtolength{\textheight}{2.0in} \addtolength{\topmargin}{-1.15in} \addtolength{\textwidth}{1.2in} \addtolength{\evensidemargin}{-0.75in} \addtolength{\oddsidemargin}{-0.7in} \setlength{\parskip}{0.1in} \setlength{\parindent}{0.0in} \pagestyle{empty} \raggedbottom \newcommand{\given}{\, | \,} \begin{document} \begin{flushleft} Prof.~David Draper \\ Department of \\ \hspace*{0.1in} Applied Mathematics and Statistics \\ University of California, Santa Cruz \end{flushleft} \begin{center} \textbf{\large AMS 206: Quiz 1 \textit{[15 points, plus 4 extra credit points]}} \end{center} \begin{tabular}{ll} \hspace*{-0.14in} Name: \underline{\hspace*{5.0in}} \\ \end{tabular} \vspace*{0.1in} Please supply your answers to the questions below in the spaces provided. If your answers extend to more than two pages, please ensure that each continuation answer identifies the question it's answering on the extra page(s), and (if you're using the scanning option for submission) make sure to scan all pages of your solution for uploading to \texttt{canvas.ucsc.edu}. You're an economist interested in patterns of unemployment of U.S.~adults over time. As part of this interest, You decide to take a sample from the population $\mathcal{ P }$ of people 18 years of age or older who were living in Santa Cruz (city, not county) as of time $T =$ (8 Jan 2018). The most recent U.S.~census, extrapolated to the beginning of 2018, estimates the total population of the city of Santa Cruz at that time as 64,465, and data from the website \texttt{suburbanstats.org} lead to an estimate of $N \doteq $ 54,342 as the total number of those people whose age was at least 18 at time $T$. You decide to take a representative sample of $n = 921$ people from $\mathcal{ P }$ and ask each sampled person ``Do you consider yourself fully employed at the time of this survey?'', with possible responses \{\textit{yes, no, other} (e.g., refuse to answer)\}. Let $\theta$ be the proportion of the 54,342 people who would have answered \textit{yes} to this question, if You had been able to survey the entire population, and let $s$ (an integer between 0 and $n$, inclusive) be the number of people in Your sample who answer \textit{yes}. \begin{itemize} \item[(1)] In class we agreed that the simplest method for obtaining a \textit{representative} sample from a (finite) population is \textit{random sampling}. Given that there's no list of \{all $N$ people, with their addresses and other contact information\} from which You could draw a random sample (which is true; for one thing, what about homeless people?), in practice would it be easy, hard, or in between for You to construct a sample that You and other reasonable people would agree is representative (like a random sample) from the population $\mathcal{ P }$? Explain briefly. \textit{[4 points]} \textit{Extra credit [4 additional points]:} Describe (on another sheet of paper) how You personally would attempt to obtain an arguably representative sample from $\mathcal{ P }$. \vspace*{0.8in} \end{itemize} For the rest of the problem, let's assume that You have indeed been able to create a sample that's similar to what You would have obtained with random sampling, and that Your results were as follows: $n_{ yes } = s = 830$ people said \textit{yes}, $n_{ no } = 72$ said \textit{no}, and $n_{ other } = 19$ were recorded as \textit{other}. \begin{itemize} \item[(2)] Before You get Your sampled data, is the logical status of $\theta$ known or unknown? What about $s$? Answer both questions at a moment in time after Your sample data has arrived. \textit{[2 points]} \end{itemize} \newpage \begin{itemize} \item[(3)] In class we saw that calculations relevant to uncertainty quantification were of two types --- \textit{probabilistic} and \textit{statistical} --- and that statistical activities in turn were of four types --- \textit{description}, \textit{inference}, \textit{prediction}, and \textit{decision-making} --- making a total of five classes of methods relevant to AMS 206. For each of the following (\textit{[1 point each])}, identify the activity or calculation as one of these five classes, and briefly explain Your choice. \begin{itemize} \item[(a)] After the data are available, You estimate that a future sample survey of size $n_{ future } = 614$ from $\mathcal{ P }$ in early 2019 would contain about $\hat{ n }_{ yes } = 553$ \textit{yes} responses. \vspace*{0.6in} \item[(b)] Before the data set arrives, and temporarily pretending that $\theta$ is known, under IID random sampling the sampling distribution (probability mass function) of $s$ given $\theta$ (and $n$) is $( s \given \theta \, \mathcal{ B } ) \sim \textrm{Binomial} ( n, \theta )$, where $\mathcal{ B }$ summarizes the background context of Your sample survey. \vspace*{0.6in} \item[(c)] In consultation with You and on the basis of Your survey, the Santa Cruz City Council votes (5 in favor, 2 opposed) to allocate \$57,300 in the fiscal year 2019 budget to be distributed to winning grant proposals for ways to reduce unemployment in the city. \vspace*{0.6in} \item[(d)] After the data have been collected, You estimate $\theta$ to be about $\hat{ \theta } = \frac{ s }{ n } = \frac{ 830 }{ 921 } \doteq$ 90.1\%, with a give-or take of about 1.0\% and a 95\% interval estimate of about $( 88.2\%, 92.0\% )$. \vspace*{0.6in} \item[(e)] You summarize your data set with the vector $( n_{ yes }, n_{ no }, n_{ other } ) =( 830, 72, 19 )$. \vspace*{0.6in} \end{itemize} \item[(4)] In estimating the unemployment rate in $\mathcal{ P }$ at time $T$, You have to decide what to do about the $n_{ other } = 19$ people who answered \textit{other}. One possible approach is \textit{sensitivity analysis}: at one extreme You could imagine that all 19 of those people would have answered \textit{yes} if they had given a \textit{yes}/\textit{no} answer, and at the other extreme You could imagine them all answering \textit{no}. This defines a range of possible unemployment rate estimates, and if this range is narrow enough You've demonstrated that it doesn't matter much what You do with the \textit{other} people. Compute the lower and upper endpoints of this range with the data set in this problem. Would You say that the effect of the \textit{other} people is negligible here? Explain briefly. \textit{[4 points]} \end{itemize} \end{document}