Является ли язык пар слов одинаковой длины, расстояние Хемминга которых равно 2 или более, без контекста?

Является ли следующий языковой контекст бесплатным?

L = {u x v y ∣ u, v, x, y \in {0, 1}^{+}, | u | = | v |, u \neq v, | x | = | y |, x \neq y}

$L = \{ uxvy \mid u,v,x,y \in \{ 0,1 \}^+, |u| = |v|, u \neq v, |x| = |y|, x \neq y\}$

Как указывает sdcvvc, слово в этом языке также может быть описано как конкатенация двух слов одинаковой длины, расстояние Хемминга которых равно 2 или больше.

Я думаю, что это не зависит от контекста, но мне трудно доказать это. Я попытался пересечь этот язык с обычным языком (например, ), а затем использовать лемму прокачки и \ или гомоморфизмы, но я всегда получаю язык, который слишком сложен, чтобы его охарактеризовать и записать. $\ 0^*1^*0^*1^*$

formal-languages context-free pushdown-automata pumping-lemma open-problem Robert777
источник

Вы пытались накачать строку

0^{u} 1^{x} 1^{u} 0^{x}

$0^u1^x1^u0^x$

Пол GD

Да, но мне не удалось выкачать эту строку из языка (это не значит, что это невозможно, просто я этого не сделал).

Robert777

@ PålGD, вам, вероятно, понадобится способ «пометить» фигуры, например,

1^{u} 0 1^{x} 0 1^{u} 0 1^{x} 0

$1^u 0 1^x 0 1^u 0 1^x 0$

vonbrand

Этот язык может быть записан как

где

- расстояние Хемминга. Обратите внимание, что если мы заменим 2 на 1, это будет без контекста ( cs.stackexchange.com/questions/307 ), но используемый там прием не будет работать. Лично я держу пари, что это не зависит от контекста.

{u v : | u | = | v |, d (u, v) \geq 2}

$\{uv:|u|=|v|,d(u,v) \geq 2\}$

d

$d$

sdcvvc

@sdcvvc: Вы правы: один разбивает

на

так, что один из отличающихся битов находится в

а другой в

. Я стою исправлено.

u

$u$

u^{'} x

$u'x$

u^{'}

$u'$

x

$x$

Андрас Саламон

Ответы:

Примечание [2019-07-30] Доказательство неверно ... вопрос сложнее, чем кажется.

После неудачной попытки здесь это другая идея.

Если мы пересекаем $L$ с регулярным языком $L_{reg} = 0^*10^*10^*10^*$ мы получаем язык CF.

Возможно, нам повезет больше, если мы будем использовать $L_{reg}' = 0^*10^*10^*10^*10^*$ (строка с ровно 4 1s).

Пусть $L_1 = L \cap L_{reg}'$ , неофициально $w \in L_1$ если его можно разбить на две половины, так что одна половина содержит ровно $\{0,1,3,4\}$ $1s$ или обе половины содержат две $1$ с но их позиции не совпадают.

Предположим, что $L_1$ - CF, и пусть $G$ - его грамматика в нормальной форме Хомского, и пусть

w = u v = 0^{a} 1 0^{b} 1 0^{c} 1 0^{d} 1 0^{e} \in L_{1}

$w = uv = 0^a 1 0^b 1 0^c 1 0^d 1 0^e \in L_1$

У нас есть $|u|=|v|$ (четная длина) и $d(u,v) \geq 2$

Если мы ограничим наше внимание способами, которыми можно сгенерировать четыре 1-й точки $w$ , у нас будут три случая, показанные в верхней части рисунка 1. В центральной части рисунка 1 показан первый случай (но остальные похожи) ,

введите описание изображения здесь
Рисунок 1 (полную картину можно скачать здесь )

Если мы выберем $a=e, c=2a$ и $b,d \gg a$ мы увидим, что нули между двумя парами единиц должны прокачиваться независимо (красные узлы на рисунке): в частности, для достаточно больших $b \gg a$ , мы получаем дублирующий нетерминальный узел на внутреннем поддереве (узел X на рисунке 2) или повторяющуюся подпоследовательность на пути к первому или второму 1 (узел Y на рисунке 2). Обратите внимание, что рисунок 2 немного упрощен: может быть больше нетерминальных узлов между двумя $X$ s, а также между двумя $Ys$ ( $Y\to ... \to Z_i \to ... Y$ но с $Z_i$ который выдает только 0 справа от первого 1).

введите описание изображения здесь
фигура 2

Таким образом, мы можем зафиксировать произвольный $a = e = k, c = 2a$ , а затем выбрать достаточно большой $b$ чтобы получить независимо прокачиваемый узел на последовательности нулей между первым и вторым $1$ . Для последовательности нулей между третьим и четвертым 1 мы можем выбрать $d = b! +b$ .
Но $0^b$ прокачивается независимо, поэтому существует подстрока , перекачиваемая $p \leq b$ , т.е. такая, что $y$ $b = xyz, |y|=p, |x|\geq 0, |z|\geq 0$ и $xy^iz = b!+b$ . Строка, которую мы получаем:

w^{'} = 0^{k} 1 0^{b! + b} 1 0^{2 k} 1 0^{b! + b} 1 0^{k}

$w' = 0^k 1 0^{b!+b} 1 0^{2k} 1 0^{b!+b} 1 0^k$

но $w' \notin L_1$ . Таким образом, $L_1$ не является CF и, наконец, $L$ не является CF.

If the proof is correct (???) it can be extended to every language $L_k = \{ uv : |u|=|v|, d(u,v)\geq k\}, k\geq 2$

Vor
источник

I'm afraid the bounty will expire before we can actually verify this proof, so unless any drastic information arises in the next 4 hours, this gets the points for being the best attempt so far.

jmite

@jmite: don't worry there are high chances that it is a wrong attempt like the previous one (which lasted for about 30 mins before discovering a trivial error) :-) :-)

Vor

Почему различие дела? Ветви в грамматике не имеют отношения к половинкам слова. Но я думаю, что это не имеет значения; если доказательство работает, это различие в случае не требуется. Глядя на предполагаемую грамматику и используя доказательство леммы Насосного вместо самой леммы, это хороший трюк (это следует делать чаще). У меня есть одна (реальная) проблема: если вы качаете подстроку

, вы получаете

; Я не понимаю, как вы попали в

, Не думайте, что это должно повредить доказательство, но лучше проверьте. Кроме того, вы можете исправить некоторые обозначения (и опечатки).

0^{b}

$0^b$

0^{b + p (i - 1)}

$0^{b+p(i-1)}$

b + b!

$b+b!$

Raphael

@Raphael: thanks for the comments. Perhaps I'm wrong, but if you pick as target length

b + b!

$b+b!$ then for every pumping length

p

$p$ , the string

0^{b}

$0^b$ can be decomposed in

0^{x y z}, (| x y z | = b, | y | = p \leq b)

$0^{xyz}, (|xyz|=b, |y|=p \leq b)$ and can be pumped to

x y^{i} z = b + b!

$xy^iz = b + b!$ , indeed in your example p surely divides

b!

$b!$ , so there is a

(i - 1)

$(i-1)$ for which

p (i - 1) = b!

$p(i-1)=b!$ , but the original string length is

b

$b$ , so the total pumped length is

| x y^{(i - 1)} z | = b + b!

$|xy^{(i-1)}z| = b+b!$ . I remember it from a couple of exercises that use the Ogden's lemma ... now I'll double check them.

Vor

@Raphael: ... I didn't find the proof anywhere but only a paper by Zach Tomaszewski that proves that the complement of

L_{d u p} = {w w}

$L_{dup} = \{ ww \}$ is CF (see question ), so perhaps it is a new result (though simple); and a pumping-lemma-style theorem can be derived for languages with strings that contain a finite number of a particular symbol and substrings of arbitary length between them.

Vor

After 2 failed attempts, that were disproved by @Hendrik Jan (thank you), here is another one, that is not more successful. @Vor found an example of a deterministic CF language where the same construction would apply, if correct. This allowed identifying an error in the anchoring of the $y$ string in the application of the lemma. The lemma itself does not seem at fault. This is clearly too simplistic a construction. See more details in the comments.

The language $L = \{ uxvy \mid u,v,x,y \in \{ 0,1 \}^*\text \{ \epsilon \} \ ,\ \mid u \mid = \mid v \mid \ , \ u \not= v \ , \ \mid x \mid = \mid y \mid \ , \ x \not= y \ \}$ is not Context-Free.

$L= \{uv:|u|=|v|,d(u,v) \geq 2\}$

$10^i10^j$ $i \lt j$ $i+j$ $u$ $x$ anywhere between the two 1's. We want to pump that string on the first part between the 1's, so that it will become $10^j10^j$ which is not supposed to be in the language.

We first try to use Ogden's lemma, which is like the pumping lemma, but applies to $p$ or more distinguished symbols that are marked on the string, $p$ being the pumping length for marked symbols (but the lemma can pump more because it can pump also unmarked symbols). The pumping marked-length $p$ depends only on the language. This attempt will fail, but the failure will be a hint.

We can then choose $i=p$ and we mark symbols on the first sequence of $i$ 0's. We know that none of the two 1's will be in the pump, because it can pump out once (exponent 0) instead of pumping in. And pumping out the 1's would get us out of the language.

However, we could be pumping on both sides of the second 1 as fast or even faster on the right side, so that the second 1 would never get across the middle of the string. Also Ogden's lemma does not fix an upper limit to the size of what is being pumped, so that it is not possible to organize the pumping to get the rightmost 1 exactly across the middle of the string.

We use a modified version of the lemma, here called Nash's Lemma, which can handle these difficulties.

We first need a definition (it probably has another name in the literature, but I do not know which - help is welcome). A string $u$ is said to be an erasure of a string $v$ iff it is obtained from $v$ by erasing symbols in $v$ . We will note $u \prec v$ .

Nash's Lemma : If $L$ is a context-free language, then there exists two numbers $p\gt0$ and $q\gt 0$ such that for any string $w$ of length at least $p$ in $L$ , and every way of “marking” $p$ or more of the positions in $w$ , $w$ can be written as $w=uxyzv$ with string $u$ , $x$ , $y$ , $z$ , $v$ , such that

$xz$ has at least one marked position,
$xyz$ has at most $p$ marked positions, and
there are 3 strings , , such that
1. $\hat x \prec x$ , $\hat y \prec y$ , $\hat z \prec z$ ,
2. $1 \leq \mid \hat x \hat z \mid \leq q$ , $1 \leq \mid \hat y \mid \leq q$ , and
3. $ux^j\hat x^i\hat y\hat z^iz^jv$ is in $L$ for every $i \geq 0$ and for every $j \geq 0$ .

Proof: Similar to the proof of Ogden's lemma, but the subtrees corresponding to the strings $y$ and $xz$ are pruned so that they do not contain any path with twice the same non-terminal (except for the roots of these two subtrees). This necessarily limits the size of the generated strings $\hat x\hat z$ and $\hat y$ by a constant $q$ . The strings $x^j$ and $z^j$ , for $j \geq 0$ , corresponding to an unpruned version of the tree, are used mainly with $j=1$ to simplify the accounting when the lemma is applied.

We modify the above proof attempt by marking the $p$ leftmost symbols 0, but they are followed by $2q$ symbols 0 to make sure that we pump in the left part of the string, between the two 1's. That make a total of $i = p + 2q$ 0's between the 1's (actually $i = p + q$ would be sufficient, since the rightmost 1 cannot be in $\hat z$ , which would allow to simply remove it).

What is left is to have chosen $j$ so that we can pump exactly the right number of 0's so that the two sequences are equal. But so far, the only constraint on $j$ is to be greater than $i$ . And we also know that the number of 0's that are pumped at each pumping is between 1 and q. So let $h$ be product of the first $q$ integers. We choose $j=i+h$ .

Hence, since the pumping increment $d$ - whatever it is - is in $[1,q]$ , it divides $h$ . Let $k$ be the quotient. If we pump exactly $k$ times, we get a string $10^j10^j$ which is not in the language. Hence L is not context-free.

I think that I shall never see
A string lovely as a tree.
For if it does not have a parse,
The string is naught but a farce

babou
источник

Note however that the pass over the second half reads the stack in reverse. That seems to mean that the two positions are in the same position in both halves, but in reverse?

Hendrik Jan

you are correct ... I goofed ... now I know what was nagging me at the back of my head.

babou

I recognized the argument (because I could not make it work when I tried myself).

Hendrik Jan

Should I leave this wrong answer ? It is somehow helping, I think, as it make the problem suspiciously similar to

a^{i} b^{j} c^{k} a^{i} b^{j} c^{k}

${a^ib^jc^ka^ib^jc^k}$ . The problem is that rules of the site are not intended to encourage wrong results for discussion ( I mean I do not enjoy downvotes more than anyone else).

babou

@HendrikJan Did I goof again ? (BTW, thanks for making it a discussion)

babou

-1

by this question I think $L$ is context-free and generated by the following grammar $\qquad\begin{align} S &\to AXBY \mid BYAX \\ A &\to 0 \mid 0A0 \mid 0A1 \mid 1A0 \mid 1A1 \\ B &\to 1 \mid 0B0 \mid 0B1 \mid 1B0 \mid 1B1 \\ X &\to 0 \mid 0X0 \mid 0X1 \mid 1X0 \mid 1X1 \\ Y &\to 1 \mid 0Y0 \mid 0Y1 \mid 1Y0 \mid 1Y1 \\ \end{align}$

M.K. Dadsetani
источник

This is incorrect; you cannot guard that length of AX is the same as BY. For example, your grammar generates S -> AXBY -> A011 -> 0A1011 -> 001011 which is not in the original language. Also, your symbols A and X generate the same language, same for B and Y; they can be merged.

sdcvvc