\documentclass{article}
\usepackage[margin=1in]{geometry}
\usepackage{amsmath,amsthm,amssymb}
\usepackage{relsize}
\newcounter{lecnum}
\usepackage{graphicx}
\graphicspath{./}
\usepackage{caption}
\usepackage{subcaption}
\newcommand{\abs}[1]{\lvert #1 \rvert}
\newcommand{\lecture}[4]{
   \newpage
   \setcounter{lecnum}{#1}
   \noindent

   \begin{center}
   \framebox{
      \vbox{\vspace{2mm}
    \hbox to 16cm { {\bf CS761 Derandomization and Pseudorandomness
                        \hfill 2022-23 Sem I} }
       \vspace{4mm}
       \hbox to 16cm { {\Large \hfill Lecture #1: #2  \hfill} }
       \vspace{2mm}
       \hbox to 16cm { {\it Scribe: #4  \hfill  Lecturer: #3} }
      \vspace{2mm}}
   }
   \end{center}
   \vspace*{4mm}
}

\newtheorem{theorem}{Theorem}[lecnum]
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{claim}[theorem]{Claim}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{definition}[theorem]{Definition}

% custom
\usepackage{enumitem}
\usepackage{hyperref}
\usepackage{cleveref}
\usepackage{mathtools}
\newcommand{\BPP}{\mathsf{BPP}}
\newcommand{\N}{\mathbb{N}}
\newcommand{\PP}{\mathsf{PP}}
\newcommand{\avg}{\text{avg}}
\newcommand{\worst}{\text{worst}}
\newcommand{\WH}{\operatorname{WH}}
\newcommand{\RM}{\operatorname{RM}}
\newcommand{\F}{\mathbb{F}}

\begin{document}

\lecture{19}{17-10-2022}{Rohit Gurjar}{Amit Rajaraman}

	In the last lecture, we saw that the relative distance of the Reed Muller code was $1 - d/|\F|$, when viewed as a code on alphabet $\F$. When viewed as a code on alphabet $\{0,1\}$ however, this goes to $(1 - d/|\F|)/\log|\F|$. This issue of the relative distance being $o(1)$ cannot be fixed even by changing $\F,\ell,d$.
	
	To fix this, we will concatenate Reed-Muller code with another binary code. Let $x \in \F^{|\F|^\ell}$  be a codeword of the Reed-Muller code. 
	For each coordinate $x_i \in \F$, we will view it as a binary string in $\{0,1\}^{\log\abs{\F}}$ and then apply a binary code
	$\{0,1\}^{\log\abs{\F}} \to \{0,1\}^{t}$ on it.
	
	%for each element of $\F$, 
	%(each coordinate when viewed as a code on alphabet $\F$), we shall replace it with another codeword, possibly larger. That is, if we encode it as $x \in \F^{|\F|^\ell}$ under the Reed-Muller code, we encode each $x_i$ as another element $\{0,1\}^{t}$, where $t$ will end up being $|\F|$.\\

	This second code is the \emph{Walsh-Hadamard code}, defined as follows. The encoding is a function $\WH : \{0,1\}^k \to \{0,1\}^{2^k}$, where for each $S \subseteq [k]$, we have $(\WH(x))_S = \bigoplus_{i \in S} x_i$.
	
	We claim that the relative distance of this code is $1/2$. 
	 Indeed, any two strings differing on some $r$ bits, their encodings will differ on precisely the coordinates
	  corresponding to those subsets that contain an odd number of these $r$ bits. 
	  The number of such subsets will be exactly $2^k/2$.
	Further, it turns out that the Walsh-Hadamard code is optimal.

	\begin{proposition}
		Let $E : \{0,1\}^{n} \to \{0,1\}^m$ be a code with $m < 2^n-1$. Then, the relative distance of $E$ is at most $1/2$. % if m \ge 2^n, we can just assign each word in \{0,1\}^n to a codeword has exactly one coordinate 1 and everything else 0.
	\end{proposition}
	\begin{proof}[Proof sketch]
		Suppose instead that $E$ is a function to $\{-1,1\}^m$ (replacing $0$ with $-1$) with relative distance $\Delta > (1/2)$. Note that $\langle  f(x),f(y) \rangle < 0$ for any distinct $x,y \in \{0,1\}^{n}$. The number of such vectors is at most $m+1 < 2^n$ (see, for example, \href{https://mathoverflow.net/questions/31436/largest-number-of-vectors-with-pairwise-negative-dot-product}{here}) so we are done.
	\end{proof}
	
	In fact, a similar argument can show that  we cannot have an arbitrary size code with distance more than $1/2$. That is, for any constant $\delta$ more than $1/2$, there is a number $m_0$ such that any binary code with distance $\delta$ must have size at most $m_0$.
	
	In addition, the Walsh-Hadamard code is locally decodable. Given some corruption of the encoding $\WH(x)$, we can consider sets of the form $T$ and $T \cup \{i\}$, where $i \not\in T$. Adding (XORing) the two bits $(\WH(x))_T \oplus (\WH(x))_{T \cup \{i\}}$ will give us $x_i$,
	 in case these particular two bits are not corrupted.
	  When there is corruption, we can just choose a bunch of random sets $T$ and perform this same operation, taking the majority finally. Suppose 
	  the encoding $\WH(x)$ has been corrupted in $\rho$ fraction of coordinates.
	  The probability that either of the $\WH(x)_T$ and $\WH(x)_{T \cup \{i\}}$ is corrupted is at most $2\rho$ (by union bound). 
	  Hence, we get the correct value of $x_i$ with probability $1-2\rho$. The probability of success is more than half whenever $\rho < 1/4$. 
	  We can boost the probability by repetition.
	  
%	  uncorrupted or both are corrupted (since we get the correct value of $x_i$ in either case) is $(1-\rho)^2 + \rho^2 = 1 - 2\rho(1-\rho)$. When $\rho < (1/2)$, this is more than $1/2$ so the majority value gives the correct value with high probability.\\
	
	In conclusion, our final code is $\WH(\RM(x))$.\footnote{mildly abusing notation to mean that we apply $\WH$ on a coordinate-by-coordinate basis to $\RM(x)$.} Here, $\WH$ is a mapping from $\{0,1\}^{\log|\F|} \to \{0,1\}^{|\F|}$. The relative distance of this code is $(1/2)(1 - d/|\F|)$, which is $\Theta(1)$ for appropriate $d,|\F|$! We can handle an error fraction of $\rho \approx \Delta/2 \approx (1/4)$. For local decoding, one needs to combine the two local decoding algorithms for Reed-Muller and Walsh-Hadamard.
	One interesting thing is that due to the previous proposition, we cannot do better than $1/4$. % using a coding theory-based proof like this.

	Now, we have gone from exponential $H_\worst$ to exponential $H_\avg^{1-\rho}$, which in the limiting case is $H_\avg^{3/4}$. How do we go from this to $H_\avg$? We do not delve into the details of this, but the main result used is the following.

	\begin{theorem}[Yao's XOR Lemma]
		Given a function $f : \{0,1\}^n \to \{0,1\}$, define the function $\hat{f} : \{0,1\}^{nk} \to \{0,1\}$ defined by
		\[ f(\overline{x}_1 , \overline{x}_2 , \ldots , \overline{x}_k) = f(\overline{x}_1) \oplus f(\overline{x}_2) \oplus \cdots \oplus f(\overline{x}_k), \]
		where each $\overline{x}_i$ is in $\{0,1\}^n$.\\
		If $\delta > 0$ and $\epsilon > 2(1-\delta)^k$,
		\[ H_\avg^{(1/2) + \epsilon} (\hat{f}) \ge \frac{\epsilon^2}{400n} H_\avg^{1-\delta}(f). \]
	\end{theorem}
	Given a function with exponentially large $H_\avg^{1-\delta}$, making $\epsilon$ appropriately exponentially small. % (about $H_\avg^{1-\delta}(f)^{-1/3}$) does the job.\\

	Alternatively, one way to go directly from $H_\worst$ to $H_\avg$ is to use \emph{local list decoding}. List decoding allows us to go beyond error fraction $\Delta/2$, and in fact arbitrarily close to $\Delta$. Hence, we can boost hardness to $H_\avg^{1/2+\epsilon}$ for any $\epsilon >0$.
	% for the Reed-Muller and Walsh-Hadamard combination we saw earlier in the lecture. 

	% Therefore, if we have a function that has exponential worst-case hardness, $\BPP = \PP$!
\end{document}