Робастность расщепления хунты

16

Мы говорим, что булева функция $f: \{0,1\}^n \to \{0,1\}$ является юнтой, если имеет не более влияющих переменных. $k$ $f$ $k$

Пусть - -юнта. Обозначим переменные через . Исправить Ясно, что существует такой, что содержит хотя бы из влияющих переменных . $f: \{0,1\}^n \to \{0,1\}$ $2k$ $f$ $x_1, x_2, \ldots, x_n$

S 1 = {x 1, x 2, \dots, x n 2}, S 2 = {x n 2 + 1, x n 2 + 2, \dots, x n} .

$S_1 = \left\{ x_1, x_2, \ldots, x_{\frac{n}{2}} \right\},\quad S_2 = \left\{ x_{\frac{n}{2} + 1}, x_{\frac{n}{2} + 2}, \ldots, x_n \right\}.$

S∈{S1,S2} $S \in \{S_1, S_2\}$

S $S$

k $k$

f $f$

Теперь пусть и предположим, что равно -far от каждой -junta (т. Е. Нужно изменить долю по крайней мере значений , чтобы сделать его -junta). Можем ли мы сделать «надежную» версию заявления выше? То есть существует ли универсальная константа и множество такое, что находится в -дале от каждой функции, которая содержит не более влияющих переменных в ? $\epsilon > 0$ $f: \{0,1\}^n \to \{0,1\}$ $\epsilon$ $2k$ $\epsilon$ $f$ $2k$ $c$ $S \in \{S_1, S_2\}$ $f$ $\frac{\epsilon}{c}$ $k$ $S$

Примечание: в первоначальной формулировке вопроса было зафиксировано как . Пример Нила показывает, что такого значения недостаточно. Однако, поскольку при тестировании свойств мы обычно не слишком заботимся о константах, я немного смягчил условие. $c$ $2$ $c$

Можете ли вы уточнить свои условия? Является ли переменная "влияющей", если значение f не всегда независимо от переменной? Означает ли «изменить значение

f $f$ », изменить одно из значений

f(x) $f(x)$ для некоторого конкретного

x $x$ ?

Нил Янг

Конечно, переменная

xi $x_i$ оказывает влияние, если существует

n $n$ битная строка

y $y$ такая, что

f(y)≠f(y′) $f(y) \neq f(y')$ , где

y′ $y'$ - строка

y $y$ с перевернутой

i $i$ -й координатой. Изменение значения

f $f$ означает изменение его таблицы истинности.

17

Ответ «да». Доказательство от противного.

Для удобства обозначения обозначим первые $n/2$ переменные через $x$ а вторые $n/2$ переменные через $y$ . Предположим , что $f(x,y)$ является $\delta$ -близко к функции $f_1(x,y)$ , которое зависит только от $k$ координат $x$ . Обозначим его влиятельные координаты через $T_1$ . Аналогично, предположим , что $f(x,y)$ является $\delta$ близка к функции $f_2(x,y)$ которая зависит только от $k$ координат $y$ . Обозначим его влиятельные координаты через $T_2$ . Мы должны доказатьчто $f$ является $4\delta$ - близко к $2k$ -junta $\tilde f(x,y)$ .

Скажем, что $(x_1,y_1) \sim (x_2,y_2)$ если $x_1$ и $x_2$ сходятся по всем координатам в $T_1$ а $y_1$ и $y_2$ сходятся по всем координатам в $T_2$ . Равномерно выбираем представителя из каждого класса эквивалентности. Пусть $(\bar x, \bar y)$ будет представителем класса . Определить следующим: $(x,y)$ $\tilde f$

f ~ (x, y) = f (x ¯, y ¯) .

$\tilde f(x,y) = f(\bar x, \bar y).$

Очевидно , что является -junta (это зависит только от переменных . Докажем, что он находится на расстоянии от в ожидании. $\tilde f$ $2k$ $T_1 \cup T_2)$ $4\delta$ $f$

Мы хотим доказать , что где и выбираются случайным образом равномерно. Рассмотрим случайный вектор

Pr f ~ (Pr x, y (f ~ (x, y) \neq f (x, y))) = Pr (f (x ¯, y ¯) \neq f (x, y)) \leq 4 δ,

$\Pr_{\tilde f}(\Pr_{x,y}(\tilde f(x,y) \neq f(x,y))) = \Pr(f(\bar x, \bar y) \neq f(x,y)) \leq 4\delta,$

x $x$

y $y$

получается из

, сохраняя все биты в

и случайнымлистать все биты не в

, и вектор

определяется аналогично. Заметимчто

x~ $\tilde x$

x $x$

T1 $T_1$

y~ $\tilde y$

Pr (f ~ (x, y) \neq f (x, y)) = Pr (f (x ¯, y ¯) \neq f (x, y)) = Pr (f (x ~, y ~) \neq f (x, y)) .

$\Pr(\tilde f(x,y) \neq f(x,y)) = \Pr(f(\bar x, \bar y) \neq f(x,y))= \Pr(f(\tilde x, \tilde y) \neq f(x,y)).$

Мы имеем,

Pr (f (x, y) \neq f (x ~, y)) \leq Pr (f (x, y) \neq f 1 (x, y)) + Pr (f 1 (x, y) \neq f 1 (x ~, y)) + Pr (f 1 (x ~, y) \neq f (x ~, y)) \leq δ + 0 + δ = 2 δ .

$\Pr(f(x,y) \neq f(\tilde x, y)) \leq \Pr(f(x,y) \neq f_1(x, y)) + \Pr(f_1(x,y) \neq f_1(\tilde x, y)) + \Pr(f_1(\tilde x,y) \neq f(\tilde x, y)) \leq \delta + 0 + \delta = 2\delta.$

Similarly, $\Pr(f(\tilde x,y) \neq f(\tilde x, \tilde y)) \leq 2\delta$ . We have

Pr (f (x ¯, y ¯) \neq f (x, y)) \leq 4 δ .

$\Pr(f(\bar x, \bar y) \neq f(x,y)) \leq 4\delta.$ QED

It easy to “derandomize” this proof. For every $(x,y)$ , let $\tilde f(x,y) = 1$ if $f(x,y) = 1$ for most $(x',y')$ in the equivalence class of $(x,y)$ , and $\tilde f(x,y) = 0$ , otherwise.

Yury
источник

12

The smallest $c$ that the bound holds for is $c = \frac{1}{\sqrt 2 - 1} \approx 2.41$ .

Lemmas 1 and 2 show that the bound holds for this $c$ . Lemma 3 shows that this bound is tight.

(In comparison, Juri's elegant probabilistic argument gives $c=4$ .)

Let $c=\frac{1}{\sqrt 2 - 1}$ . Lemma 1 gives the upper bound for $k=0$ .

Lemma 1: If $f$ is $\epsilon_g$ -near a function $g$ that has no influencing variables in $S_2$ , and $f$ is $\epsilon_h$ -near a function $h$ that has no influencing variables in $S_1$ , then $f$ is $\epsilon$ -near a constant function, where $\epsilon \le \frac{(\epsilon_g+\epsilon_h)/2}{c}$ .

Proof. Let $\epsilon$ be the distance from $f$ to a constant function. Suppose for contradiction that $\epsilon$ does not satisfy the claimed inequality. Let $y=(x_1,x_2,\ldots,x_{n/2})$ and $z=(x_{n/2}+1,\ldots,x_n)$ and write $f$ , $g$ , and $h$ as $f(y,z)$ , $g(y,z)$ and $h(y,z)$ , so $g(y,z)$ is independent of $z$ and $h(y,z)$ is independent of $y$ .

(I find it helpful to visualize $f$ as the edge-labeling of the complete bipartite graph with vertex sets $\{y\}$ and $\{z\}$ , where $g$ gives a vertex-labeling of $\{y\}$ , and $h$ gives a vertex-labeling of $\{z\}$ .)

Let $g_0$ be the fraction of pairs $(y,z)$ such that $g(y,z) = 0$ . Let $g_1=1-g_0$ be the fraction of pairs such that $g(y,z) = 1$ . Likewise let $h_0$ be the fraction of pairs such that $h(y,z) = 0$ , and let $h_1$ be the fraction of pairs such that $h(y,z) = 1$ .

Without loss of generality, assume that, for any pair such that $g(y,z) = h(y,z)$ , it also holds that $f(y,z) = g(y,z) = h(y,z)$ . (Otherwise, toggling the value of $f(y,z)$ allows us to decrease both $\epsilon_g$ and $\epsilon_h$ by $1/2^n$ , while decreasing the $\epsilon$ by at most $1/2^n$ , so the resulting function is still a counter-example.) Say any such pair is ``in agreement''.

The distance from $f$ to $g$ plus the distance from $f$ to $h$ is the fraction of $(x,y)$ pairs that are not in agreement. That is, $\epsilon_g + \epsilon_h = g_0 h_1 + g_1 h_0$ .

The distance from $f$ to the all-zero function is at most $1 - g_0 h_0$ .

The distance from $f$ to the all-ones function is at most $1-g_1 h_1$ .

Further, the distance from $f$ to the nearest constant function is at most $1/2$ .

Thus, the ratio $\epsilon/(\epsilon_g+\epsilon_h)$ is at most

min ( 1 / 2 , 1 - g 0 h 0 , 1 - g 1 h 1 ) g 0 h 1 + g 1 h 0,

$\frac{\min(1/2, 1-g_0 h_0, 1-g_1 h_1)}{g_0 h_1 + g_1 h_0},$ where

g0,h0∈[0,1] $g_0,h_0 \in [0,1]$ and

g1=1−g0 $g_1 = 1-g_0$ and

h1=1−h0 $h_1=1-h_0$ .

By calculation, this ratio is at most $\frac{1}{2(\sqrt 2 - 1)} = c/2$ . QED

Lemma 2 extends Lemma 1 to general $k$ by arguing pointwise, over every possible setting of the $2k$ influencing variables. Recall that $c=\frac{1}{\sqrt 2 - 1}$ .

Lemma 2: Fix any $k$ . If $f$ is $\epsilon_g$ -near a function $g$ that has $k$ influencing variables in $S_2$ , and $f$ is $\epsilon_h$ -near a function $h$ that has $k$ influencing variables in $S_1$ , then $f$ is $\epsilon$ -near a function $\hat f$ that has at most $2k$ influencing variables, where $\epsilon \le \frac{(\epsilon_g+\epsilon_h)/2}{c}$ .

Proof. Express $f$ as $f(a,y,b,z)$ where $(a,y)$ contains the variables in $S_1$ with $a$ containing those that influence $h$ , while $(b,z)$ contains the variables in $S_2$ with $b$ containing those influencing $g$ . So $g(a,y,b,z)$ is independent of $z$ , and $h(a,y,b,z)$ is independent of $y$ .

For each fixed value of $a$ and $b$ , define $F_{ab}(y,z) = f(a,y,b,z)$ , and define $G_{ab}$ and $H_{ab}$ similarly from $g$ and $h$ respectively. Let $\epsilon^g_{ab}$ be the distance from $F_{ab}$ to $G_{ab}$ (restricted to $(y,z)$ pairs). Likewise let $\epsilon^h_{ab}$ be the distance from $F_{ab}$ to $H_{ab}$ .

By Lemma 1, there exists a constant $c_{ab}$ such that the distance (call it $\epsilon_{ab}$ ) from $F_{ab}$ to the constant function $c_{ab}$ is at most $(\epsilon^h_{ab} + \epsilon^g_{ab})/(2c)$ . Define $\hat f(a,y,b,z) = c_{ab}$ .

Clearly $\hat f$ depends only on $a$ and $b$ (and thus at most $k$ variables).

Let $\epsilon_{\hat f}$ be the average, over the $(a,b)$ pairs, of the $\epsilon_{ab}$ 's, so that the distance from $f$ to $\hat f$ is $\epsilon_{\hat f}$ .

Likewise, the distances from $f$ to $g$ and from $f$ to $h$ (that is, $\epsilon_g$ and $\epsilon_h)$ are the averages, over the $(a,b)$ pairs, of, respectively, $\epsilon^g_{ab}$ and $\epsilon^h_{ab}$ .

Since $\epsilon_{ab} \le (\epsilon^h_{ab} + \epsilon^g_{ab})/(2c)$ for all $a, b$ , it follows that $\epsilon_{\hat f} \le (\epsilon_g + \epsilon_h)/(2c)$ . QED

Lemma 3 shows that the constant $c$ above is the best you can hope for (even for $k=0$ and $\epsilon=0.5$ ).

Lemma 3: There exists $f$ such that $f$ is $(0.5/c)$ -near two functions $g$ and $h$ , where $g$ has no influencing variables in $S_2$ and $h$ has no influencing variables in $S_1$ , and $f$ is $0.5$ -far from every constant function.

Proof. Let $y$ and $z$ be $x$ restricted to, respectively, $S_1$ and $S_2$ . That is, $y=(x_1,\ldots,x_{n/2})$ and $z=(x_{n/2+1},\ldots,x_n)$ .

Identify each possible $y$ with a unique element of $[N]$ , where $N=2^{n/2}$ . Likewise, identify each possible $z$ with a unique element of $[N]$ . Thus, we think of $f$ as a function from $[N]\times[N]$ to $\{0,1\}$ .

Define $f(y,z)$ to be 1 iff $\max(y,z) \ge \frac{1}{\sqrt 2}N$ .

By calculation, the fraction of $f$ 's values that are zero is $(\frac{1}{\sqrt 2})^2 = \frac{1}{2}$ , so both constant functions have distance $\frac{1}{2}$ to $f$ .

Define $g(y,z)$ to be 1 iff $y\ge \frac{1}{\sqrt 2}N$ . Then $g$ has no influencing variables in $S_2$ . The distance from $f$ to $g$ is the fraction of pairs $(y,z)$ such that $y<\frac{1}{\sqrt 2}N$ and $z\ge \frac{1}{\sqrt 2}N$ . By calculation, this is at most $\frac{1}{\sqrt 2}(1-\frac{1}{\sqrt2}) = 0.5/c$

Similarly, the distance from $f$ to $h$ , where $h(y,z)=1$ iff $z\ge \frac{1}{\sqrt 2}N$ , is at most $0.5/c$ .

QED

Neal Young
источник

First of all, thanks Neal! This indeed sums it up for

$k=0$ , and sheds some light on the general problem. However in the case of

$k=0$ the problem is a bit degenerate (as

$2k=k$ ), so I'm more curious regarding the case of

$k \ge 1$ . I didn't manage to extend this claim for

$k>0$ , so if you have an idea on how to do it - I'd appreciate it. If it simplifies the problem, then the exact constants are not crucial; that is,

$\epsilon/2$ -far can be replaced by

$\epsilon/c$ -far, for some universal constant

$c$ .

2

I've edited it to add the extension to general k. And Yuri's argument below gives a slightly looser factor with an elegant probabilistic argument.

Neal Young

Sincere thanks Neal! This line of reasoning is quite enlightening.

Робастность расщепления хунты

Ответы: