\documentclass[12pt]{article}

\usepackage{ amsmath, amssymb, graphicx, psfrag, bm }
   
\addtolength{\textheight}{2.0in}
\addtolength{\topmargin}{-1.15in}
\addtolength{\textwidth}{1.3in}
\addtolength{\evensidemargin}{-0.75in}
\addtolength{\oddsidemargin}{-0.7in}
\setlength{\parskip}{0.1in}
\setlength{\parindent}{0.0in}

\pagestyle{empty}

\raggedbottom
 
\newcommand{\given}{\, | \,}
 
\begin{document}

\begin{flushleft}

Prof.~David Draper \\
Department of \\ 
\hspace*{0.1in} Applied Mathematics and Statistics \\
University of California, Santa Cruz

\end{flushleft}

\begin{center}

\textbf{\large AMS 206: Quiz 4 \textit{[30 points]} (with important Hint)}

\end{center}

\begin{tabular}{ll}

\hspace*{-0.14in} Name: \underline{\hspace*{5.0in}} \\

\end{tabular}

\vspace*{0.1in}

Please supply your answers to the questions below in the spaces provided. If your answers extend to more than three pages, please ensure that each continuation answer identifies the question it's answering on the extra page(s), and (if you're using the scanning option for submission) make sure to scan all pages of your solution for uploading to \texttt{canvas.ucsc.edu}.

(Inference for the variance in the Gaussian model with known mean) People in Las Vegas who are experts on the National Football League provide a \textit{point spread} for every football game before it occurs, as a measure of the difference in ability between the two teams (and taking account of where the game will be played). For example, if Denver is a 3.5--point favorite to defeat San Francisco, the implication is that betting on whether Denver's final score minus 3.5 points exceeds or falls short of San Francisco's final score is an even-money (50/50) proposition. The top panel of Figure 1 below (based on data from Gelman et al.~(2014)) presents a histogram of the differences $d$ = (actual outcome -- point spread) for a sample of $n = 672$ professional football games in the early 1980s, with a Normal density superimposed having the same mean $\bar{ d } = 0.07$ and standard deviation (SD) $s = 13.86$ as the sample (if this distribution didn't have a mean that's close to 0, the experts would be \textit{uncalibrated} and you could make money by betting against them). You can see from this figure that the model $( D_i \given \sigma \, { \cal G } \, { \cal B } ) \stackrel{ \mbox{\tiny IID} }{\sim} N ( 0, \sigma^2 )$ is reasonable for the observed differences $d_i$, in which $\cal G$ stands for the Gaussian sampling distribution assumption (which we're making after looking at the data and is therefore not part of $\cal B$). \textbf{Hint: You will find it much easier to work with the Gaussian variance parameter $\sigma^2$ if You define $\theta = \sigma^2$ and focus all of Your calculations on $\theta$.}

\begin{figure}[t!]

\centering

\caption{\textit{Top panel: Differences $d_i$ between observed and predicted American football scores, 1981--1984; bottom panel: prior, likelihood and posterior for $\theta$ with the data in the top panel and the improper prior.}}

\vspace*{-0.25in}

\includegraphics[ scale = 0.75 ]{ams-206-quiz-4-figure-1.pdf}

\vspace*{-0.25in}

\end{figure}

\begin{itemize}

\item[(1)]

Write down the likelihood and log likelihood functions for $\theta$ in this model \textit{[4 points]}. Show that $\hat{ \theta }_{ MLE } = \frac{ 1 }{ n } \sum_{ i = 1 }^n d_i^2$, which takes the value 191.8 with the data in Figure 1, is both sufficient and the maximum likelihood estimator (MLE) for $\theta$ \textit{[4 points]}. Plot the log likelihood function for $\theta$ in the range from 160 to 240 with these data, briefly explaining why it should be slightly skewed to the right. \textit{[4 points]}

\newpage

\item[(2)]

The conjugate prior for $\theta$ in this model turns out to be (You don't have to show this) the \textit{scaled inverse chi-square} distribution, 
\begin{equation} \label{e:sichi2-1}
\theta \sim \chi^{ -2 } ( \nu_0, \sigma_0^2 ) , \ \ \ \mbox{ i.e., } \ \ \ p( \theta ) = c \, \theta^{ - \left( \frac{ \nu_0 }{ 2 } + 1 \right) } \exp \left( - \frac{ \nu_0 \, \sigma_0^2 }{ 2 \, \theta } \right) ,
\end{equation} 
in which $\nu_0$ is the prior sample size and $\sigma_0^2$ is a prior estimate
of $\theta$. In an attempt to be ``non-informative" people sometimes work with a version of equation (\ref{e:sichi2-1}) obtained by letting $\nu_0 \rightarrow 0$, namely $p( \theta ) = c_0 \, \theta^{ -1 }$. The resulting prior is \textit{improper} (in the usual sense: that it integrates to $\infty$), but it turns out that posterior inferences will be sensible nonetheless (even with sample sizes as small as $n = 1$). Show that with the general $\chi^{ -2 } ( \nu_0, \sigma_0^2 )$ prior in equation (\ref{e:sichi2-1}), the posterior for $\theta$ is $\chi^{ -2 } \left( \nu_0 + n, \frac{ \nu_0 \, \sigma_0^2 + n \, \hat{ \theta } }{ \nu_0 + n } \right)$, and conclude therefore that with the improper prior the posterior distribution is $\chi^{ -2 } ( n, \hat{ \theta } )$ \textit{[4 points]}.

\newpage

\item[(3)]

The bottom panel of Figure 1 plots the prior, likelihood, and posterior densities on the same graph using the data in the top panel of Figure 1 and taking $c_0 = 2.5$ for convenience in the plot. Get \texttt{R} (or some equivalent environment) to reproduce this figure (You'll need to identify which member of the $\chi^{ -2 }$ family the likelihood density is); include Your code as part of Your answer \textit{[4 points]}. Explicitly identify the three curves, and briefly discuss what this plot implies about the updating of information from prior to posterior in this case study \textit{[4 points]}. 
 
\begin{quote}
 
{\small \fbox{\textbf{Programming notes:}} If You're working with \texttt{R}, You can use the \texttt{scaled.inverse. chisq.density} function I wrote for you, which is posted on the course web page. If You're writing code in a language other than \texttt{R}, note that --- because the data values in this example lead to astoundingly large and small numbers on the original scale --- it's necessary to do all possible computations on the log scale and wait to transform back to the original scale until the last possible moment (look at my \texttt{R} function to see what I mean). You'll need to be careful to use the correct normalizing constant $c$ in equation (\ref{e:sichi2-1}), which can be found in Appendix A of the Gelman et al.~(2014) book, and You'll need a function that computes $\log \Gamma ( x )$ (such a function should be built into Your programming environment; \textit{don't} compute $\Gamma ( x )$ and take its logarithm [You'll get an overflow error with the data in this problem]).} 

\end{quote}

\vspace*{3.0in}

\item[(4)]

It can be shown (You don't need to show this) that (as long as $\nu > 4$)
\begin{equation} \label{e:sichi2-2}
( \theta \given { \cal B } ) \sim \chi^{ -2 } ( \nu, \sigma^2 ) \ \ \
\rightarrow \ \ \ E ( \theta ) = \left( \frac{ \nu }{ \nu - 2 } \right) \sigma^2 \ \ \ \mbox{and} \ \ \ V ( \theta ) = \left[ \frac{ 2 \, \nu^2 }{ ( \nu - 2 )^2 \, ( \nu - 4 ) } \right] \sigma^4 \, .
\end{equation}
Use this to compute the posterior mean and SD for $\theta$ with this data set (using the improper prior), and compare the posterior mean with the MLE, briefly explaining why they're so similar in this case \textit{[6 points]}. 

\end{itemize}

\end{document}