Trovare la massima fattorizzazione delle lingue normali

Consenti alla lingua $\mathcal{L} \subseteq \Sigma^*$ essere regolare.

Una fattorizzazione di $\mathcal{L}$ è una coppia massima $(X,Y)$ di insiemi di parole con

$X \cdot Y \subseteq \mathcal{L}$
$X \neq \emptyset \neq Y$ ,

dove $X \cdot Y = \{xy$ | $x \in X, y \in Y\}$ .

$(X,Y)$ è massimo se per ogni coppia $(X',Y') \neq (X,Y)$ con $X'\cdot Y' \subseteq \mathcal{L}$ o $X \not \subseteq X'$ o $Y \not \subseteq Y'$ .

Esiste una semplice procedura per scoprire quali coppie sono massime?

Esempio:

Sia $\mathcal{L} = \Sigma^∗ab \Sigma^∗$ . L'insieme $F = \{u, v, w\}$ viene calcolato:

$u =(\Sigma^∗, \Sigma^∗ab\Sigma^∗)$
$v = (\Sigma^∗a\Sigma^∗, \Sigma^∗b\Sigma^∗)$
$w = (\Sigma^∗ab\Sigma^∗, \Sigma^∗)$

dove $\Sigma = \{a,b\}$ .

Un altro esempio:

$\Sigma = \{a, b\}$ e $\mathcal{L} = \Sigma^*a\Sigma$ Set di fattorizzazione $F = \{q, r, s, t\}$ con

$q = (\Sigma^*, \mathcal{L})$
$r = (\Sigma^*a, \Sigma + \mathcal{L})$
$s = (\Sigma^*aa, \epsilon + \Sigma + \mathcal{L})$
$t = (\mathcal{L}, \epsilon + \mathcal{L})$

algorithms regular-languages optimization

— Laura
fonte

Consiglio di leggere il seguente articolo (in particolare la sottosezione 4.1) di Jacques Sakarovitch: perso.telecom-paristech.fr/~jsaka/PUB/Files/TUA.pdf

— Cornelius Brand

Mi chiedo se potresti voler essere più specifico sul problema, ovvero l'ultima frase della tua domanda? Ci viene dato

e vogliamo verificare se

è massimo? Il nostro compito è enumerare tutti

che sono massimi? In quest'ultimo caso, è chiaro che questo elenco è finito o di dimensioni polinomiali? Probabilmente non ha senso chiedere un algoritmo per enumerare tutte le possibilità se ce ne sono molte in modo esponenziale. Inoltre, vuoi specificare come viene rappresentata la lingua

quando ci viene presentata e come

X, Y

$X,Y$

(X, Y)

$(X,Y)$

(X, Y)

$(X,Y)$

L

${\cal L}$

X, Y

$X,Y$ sono rappresentati? (ad es. DFA, NFA, regexp)

— DW

Non capisco i tuoi esempi. Are

doveva essere tutte le coppie massime?

non sembra essere valido ...

u, v, w

$u,v,w$

v

$v$

— Raffaello

L'esempio è tratto dal documento sopra citato.

dovrebbero essere coppie massime. Anche io non capisco come

viene calcolata dal momento che non sembra essere necessariamente in

. Pubblicherò un altro esempio.

u, v, w

$u,v,w$

v

$v$

L

$\mathcal{L}$

— Laura,

@Raphael, mi sembra che

sia valido. Lasciare

è una fattorizzazione, poiché

(considera qualsiasi stringa che contiene una

, quindi qualsiasi sequenza di

e / o

's, quindi alla fine a

: questa stringa deve avere un punto in cui appare la prima

, quindi è un punto in cui contiene

v

$v$

X = Σ^{*} a Σ^{*}

$X=\Sigma^* a \Sigma^*$

Y = Σ^{*} b Σ^{*}

$Y=\Sigma^* b \Sigma^*$

(X, Y)

$(X,Y)$

X \cdot Y = L

$X \cdot Y = {\cal L}$

a

$a$

a

$a$

b

$b$

b

$b$

b

$b$

). Non ho una prova che esso è massimo, ma non riesco a trovare alcuna più grande set di

che sono una fattorizzazione di

a b

$ab$

X^{'}, Y^{'}

$X',Y'$

L

${\cal L}$

— DW

Come suggerito nei commenti alla domanda, cercherò di dare una risposta (purtroppo parziale) alla domanda, almeno nella misura in cui ho capito il problema da solo (questo implica che potresti trovare degli errori e se trovi un modo per spiegare più brevemente o chiaramente uno dei seguenti punti, sentiti libero di modificare la risposta di conseguenza):

In primo luogo, si dovrebbe notare che in realtà non dobbiamo calcolare l'automa universale di una lingua se vogliamo calcolare le fattorizzazioni di una lingua.

Dal documento menzionato nel mio commento ¹, v'è una corrispondenza 1-1 tra fattori sinistra e destra di un linguaggio regolare, cioè, dato un fattore di sinistra della lingua, il corrispondente fattore di destra viene determinato e vice versa univocamente. Più precisamente, abbiamo quanto segue:

Let è una fattorizzazione di . Quindi ovvero ogni fattore sinistro è un'intersezione dei quozienti giusti e qualsiasi fattore destro è un'intersezione dei quozienti sinistri. Viceversa, ogni intersezione dei quozienti sinistra di è un fattore diritto di e ogni intersezione dei quozienti destro di è un fattore fianco di . $(X,Y)$ $L$

Y = ⋂_{x \in X} x^{- 1} L, X = ⋂_{y \in Y} L y^{- 1},

$Y = \bigcap_{x \in X}x^{-1}L, X = \bigcap_{y \in Y}Ly^{-1},$

L

$L$

L

$L$

L

$L$

L

$L$

Si noti che per un linguaggio regolare, c'è solo un insieme finito di quozienti a destra ea sinistra, e, quindi, o un problema si riduce a calcolare i quozienti a destra ea sinistra di una lingua, e quindi di calcolare la loro chiusura -STABLE, che è, un minimo superset dei quozienti chiusi sotto l'intersezione. Questi sono poi proprio i fattori di destra e di sinistra fattori, e quindi di solito è facile vedere quali coppie sono sottoinsiemi di . $\cap$ $L$

Esempio

Per illustrare i punti di cui sopra, considera il primo esempio nella domanda (di cui penso anche che non sia corretto nel documento):

Sia . Ora, i quozienti sinistra della sono insiemi per , cioè quelle parole in che può essere preceduto da , cioè . Quando è per distinto ? Questo è il caso se e solo se $L = \Sigma^\ast ab \Sigma^\ast$ $L$ $x^{-1}L$ $x\in \Sigma^\ast$ $u$ $\Sigma^\ast$ $x$ $xu \in L$ $y^{-1}L=x^{-1}L$ $x,y$ $x$ e può essere aumentato alle parole in con esattamente gli stessi suffissi. Ciò significa che, per dirla in termini più familiari, sono equivalenti a Nerode e i suffissi necessari per aggiungere parole in una classe Nerode sono esattamente i rispettivi quozienti a sinistra. $y$ $L$

Per , vediamo che le nostre classi di equivalenza Nerode sono $L$

, l'insieme di parole che non contiene come fattore e termina con , $N_1$ $ab$ $a$
, l'insieme di parole che terminano con e che non contengono come fattore, e $N_2$ $b$ $ab$
, l'insieme di parole contenenti come fattore, ovvero $N_3$ $ab$ $N_3 = L$

Possono essere aumentati con i seguenti insiemi (ovvero, questi sono i quozienti di sinistra delle parole nelle rispettive classi):

per in costituito da tutte le parole in (qualsiasi parola può essere aumentata con una parola contenente come fattore e quindi diventa una parola in ) e , ovvero $S_1 = x^{-1}L$ $x$ $N_1$ $L$ $ab$ $L$ $b\Sigma^\ast$ $S_1 = L \cup b\Sigma^\ast$
$S_2 = x^{-1}L$ for $x$ in $N_2$ is the language itself, that is, $S_2 = L$ and
$S_3 = x^{-1}L$ for $x$ in $N_3$ is obviously $\Sigma^\ast$ . That is, we have found three right factors of $L$ . As $S_2\subset S_1\subset S_3$ , their $\cap$ -stable closure is trivially ${S_1,S_2,S_3}$ , and those are then precisely the right factors.

Hence, our factorization set $\mathcal{F}_L$ is of the form $(P_1,S_1),(P_2,S_2),(P_3,S_3)$ .

Now, for the left factors $P_i$ , we use the equations of the beginning of this answer:

P_{i} = ⋂_{x \in S_{i}} L x^{- 1}

$P_i = \bigcap_{x\in S_i} Lx^{-1}$ .

For $P_1$ , this yields $L \cup \Sigma^\ast a$ , for $P_2$ we get $\Sigma^\ast$ and for $P_3$ , we obtain $L$ . You can see this by inspection (the most popular excuse for being too lazy to state a formal proof) or by explicitly computing the right quotients (which is fairly analogous, although not completely, to computing the left quotients). Our factorizations are thus given by $\mathcal{F}_L = {u,v,w}$ where

$u = (P_1,S_1) = (\Sigma^\ast ab \Sigma^\ast \cup \Sigma^\ast a, \Sigma^\ast ab \Sigma^\ast \cup b\Sigma^\ast)$
$v = (P_2, S_2) = (\Sigma^\ast, \Sigma^\ast ab \Sigma^\ast)$ and
$w = (P_3, S_3) = (\Sigma^\ast ab \Sigma^\ast, \Sigma^\ast)$

Summary

To summarize (as you were asking for a simple procedure):

For computing the factorizations of a language $L$ , first compute the left quotients of $L$ .
You can do so, in the language of the paper, by constructing a minimal DFA $A$ for $L$ and then for each state $q$ in $A$ (corresponding, as a Nerode-equivalence class, to a left quotient) compute the future of $q$ in $A$ , thus obtaining one left quotient of the language for each state.
The collection of left quotients obtained in this way yields, in general, a subset $S_R$ of the right factors.
Compute then the $\cap$ -stable closure of $S_R$ , which can be done in practice by forming the intersection of any subset of $S_R$ and adding any subset obtained in this way to $S_R$ .
The set $S_R$ together with all the intersections from the previous step is then the set of right factors of $L$ .
In order to obtain the left factors, we can compute the right quotients of $L$ .
These are sets of the form $Ly^{-1}$ , for $y\in \Sigma^\ast$ . Now, these are again only finitely many, and for $x\neq y$ , we have $Ly^{-1} = Lx^{-1}$ if and only if for all $u\in \Sigma^\ast$ , $ux \in L \Leftrightarrow uy \in L$ , that is they can be prefixed to words in the language with precisely the same set of strings.
To compute $Lx^{-1}$ , consider those states $q$ in $A$ such that $x$ is contained in the future of $q$ . The union of the pasts of those states constitute one right quotient. Find all these quotients.
You know you are done when you have found as many left factors as you have right factors.
Find those pairs of left and right factors $X,Y$ such that $X\cdot Y \subseteq L$ . This is $\mathcal{F}_L$ .

The Universal Automaton by Lombardy and Sakarovitch (in Texts in Logic and Games, Vol 2: Logic and Automata: History and Perspectives, 2007)

— Cornelius Brand
fonte

Nice! Let's note that

A \subseteq B

$A \subseteq B$ is decidable for regular languages and that these factors

X

$X$ ,

Y

$Y$ end up being regular due to closure properties. Hence we can not only effectively compute the last bullet in the summary, but we can also filter out the maximal pairs.

— Raphael