Come sostituire più sottostringhe di una stringa?

Question 1

Vorrei utilizzare la funzione .replace per sostituire più stringhe.

Attualmente ho

string.replace("condition1", "")

ma vorrei avere qualcosa di simile

string.replace("condition1", "").replace("condition2", "text")

anche se non sembra una buona sintassi

qual è il modo corretto per farlo? un po 'come puoi fare in grep / regex \1e \2sostituire i campi con determinate stringhe di ricerca

Question 2

Ecco un breve esempio che dovrebbe fare il trucco con le espressioni regolari:

import re

rep = {"condition1": "", "condition2": "text"} # define desired replacements here

# use these three lines to do the replacement
rep = dict((re.escape(k), v) for k, v in rep.iteritems()) 
#Python 3 renamed dict.iteritems to dict.items so use rep.items() for latest versions
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[re.escape(m.group(0))], text)

Per esempio:

>>> pattern.sub(lambda m: rep[re.escape(m.group(0))], "(condition1) and --condition2--")
'() and --text--'

Question 3

Potresti semplicemente creare una piccola funzione di looping.

def replace_all(text, dic):
    for i, j in dic.iteritems():
        text = text.replace(i, j)
    return text

dove textè la stringa completa ed dicè un dizionario: ogni definizione è una stringa che sostituirà una corrispondenza con il termine.

Nota : in Python 3, iteritems()è stato sostituito conitems()

Attenzione: i dizionari Python non hanno un ordine affidabile per l'iterazione. Questa soluzione risolve il tuo problema solo se:

l'ordine delle sostituzioni è irrilevante
va bene per una sostituzione cambiare i risultati delle sostituzioni precedenti

Per esempio:

d = { "cat": "dog", "dog": "pig"}
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, d)
print(my_sentence)

Possibile output # 1:

"Questo è il mio maiale e questo è il mio maiale."

Possibile uscita n. 2

"Questo è il mio cane e questo è il mio maiale."

Una possibile soluzione consiste nell'usare un OrderedDict.

from collections import OrderedDict
def replace_all(text, dic):
    for i, j in dic.items():
        text = text.replace(i, j)
    return text
od = OrderedDict([("cat", "dog"), ("dog", "pig")])
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, od)
print(my_sentence)

Produzione:

"This is my pig and this is my pig."

Attenzione n. 2: inefficiente se la tua textstringa è troppo grande o ci sono molte coppie nel dizionario.

Question 4

Perché non una soluzione come questa?

s = "The quick brown fox jumps over the lazy dog"
for r in (("brown", "red"), ("lazy", "quick")):
    s = s.replace(*r)

#output will be:  The quick red fox jumps over the quick dog

Question 5

Ecco una variante della prima soluzione che utilizza riduci, nel caso ti piaccia essere funzionale. :)

repls = {'hello' : 'goodbye', 'world' : 'earth'}
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls.iteritems(), s)

La versione ancora migliore di martineau:

repls = ('hello', 'goodbye'), ('world', 'earth')
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls, s)

Question 6

Questo è solo un riassunto più conciso delle ottime risposte di FJ e MiniQuark. Tutto ciò che serve per ottenere più sostituzioni di stringhe simultanee è la seguente funzione:

def multiple_replace(string, rep_dict):
    pattern = re.compile("|".join([re.escape(k) for k in sorted(rep_dict,key=len,reverse=True)]), flags=re.DOTALL)
    return pattern.sub(lambda x: rep_dict[x.group(0)], string)

Utilizzo:

>>>multiple_replace("Do you like cafe? No, I prefer tea.", {'cafe':'tea', 'tea':'cafe', 'like':'prefer'})
'Do you prefer tea? No, I prefer cafe.'

Se lo desideri, puoi realizzare le tue funzioni di sostituzione dedicate partendo da questa più semplice.

Question 7

Ho costruito questo sulla risposta eccellente di FJ:

import re

def multiple_replacer(*key_values):
    replace_dict = dict(key_values)
    replacement_function = lambda match: replace_dict[match.group(0)]
    pattern = re.compile("|".join([re.escape(k) for k, v in key_values]), re.M)
    return lambda string: pattern.sub(replacement_function, string)

def multiple_replace(string, *key_values):
    return multiple_replacer(*key_values)(string)

Utilizzo di un colpo:

>>> replacements = (u"café", u"tea"), (u"tea", u"café"), (u"like", u"love")
>>> print multiple_replace(u"Do you like café? No, I prefer tea.", *replacements)
Do you love tea? No, I prefer café.

Notare che poiché la sostituzione avviene in un solo passaggio, "café" cambia in "tea", ma non torna a "café".

Se è necessario eseguire la stessa sostituzione più volte, è possibile creare facilmente una funzione di sostituzione:

>>> my_escaper = multiple_replacer(('"','\\"'), ('\t', '\\t'))
>>> many_many_strings = (u'This text will be escaped by "my_escaper"',
                       u'Does this work?\tYes it does',
                       u'And can we span\nmultiple lines?\t"Yes\twe\tcan!"')
>>> for line in many_many_strings:
...     print my_escaper(line)
... 
This text will be escaped by \"my_escaper\"
Does this work?\tYes it does
And can we span
multiple lines?\t\"Yes\twe\tcan!\"

Miglioramenti:

trasformato il codice in una funzione
aggiunto supporto multilinea
risolto un bug nell'escaping
facile creare una funzione per una specifica sostituzione multipla

Godere! :-)

Question 8

Vorrei proporre l'utilizzo di modelli di stringa. Basta inserire la stringa da sostituire in un dizionario e tutto è pronto! Esempio da docs.python.org

>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'
>>> d = dict(who='tim')
>>> Template('Give $who $100').substitute(d)
Traceback (most recent call last):
[...]
ValueError: Invalid placeholder in string: line 1, col 10
>>> Template('$who likes $what').substitute(d)
Traceback (most recent call last):
[...]
KeyError: 'what'
>>> Template('$who likes $what').safe_substitute(d)
'tim likes $what'

Question 9

Nel mio caso, avevo bisogno di una semplice sostituzione di chiavi univoche con nomi, quindi ho pensato questo:

a = 'This is a test string.'
b = {'i': 'I', 's': 'S'}
for x,y in b.items():
    a = a.replace(x, y)
>>> a
'ThIS IS a teSt StrIng.'

Question 10

Iniziando Python 3.8e introducendo le espressioni di assegnazione (PEP 572) ( :=operatore), possiamo applicare le sostituzioni all'interno di una lista di comprensione:

# text = "The quick brown fox jumps over the lazy dog"
# replacements = [("brown", "red"), ("lazy", "quick")]
[text := text.replace(a, b) for a, b in replacements]
# text = 'The quick red fox jumps over the quick dog'

Question 11

Ecco i miei $ 0,02. Si basa sulla risposta di Andrew Clark, solo un po 'più chiara, e copre anche il caso in cui una stringa da sostituire è una sottostringa di un'altra stringa da sostituire (una stringa più lunga vince)

def multireplace(string, replacements):
    """
    Given a string and a replacement map, it returns the replaced string.

    :param str string: string to execute replacements on
    :param dict replacements: replacement dictionary {value to find: value to replace}
    :rtype: str

    """
    # Place longer ones first to keep shorter substrings from matching
    # where the longer ones should take place
    # For instance given the replacements {'ab': 'AB', 'abc': 'ABC'} against 
    # the string 'hey abc', it should produce 'hey ABC' and not 'hey ABc'
    substrs = sorted(replacements, key=len, reverse=True)

    # Create a big OR regex that matches any of the substrings to replace
    regexp = re.compile('|'.join(map(re.escape, substrs)))

    # For each match, look up the new string in the replacements
    return regexp.sub(lambda match: replacements[match.group(0)], string)

È in questa sostanza , sentiti libero di modificarla se hai qualche proposta.

Question 12

Avevo bisogno di una soluzione in cui le stringhe da sostituire possono essere espressioni regolari, ad esempio per aiutare a normalizzare un testo lungo sostituendo più caratteri di spazi bianchi con uno solo. Basandomi su una catena di risposte da altri, inclusi MiniQuark e mmj, questo è quello che ho pensato:

def multiple_replace(string, reps, re_flags = 0):
    """ Transforms string, replacing keys from re_str_dict with values.
    reps: dictionary, or list of key-value pairs (to enforce ordering;
          earlier items have higher priority).
          Keys are used as regular expressions.
    re_flags: interpretation of regular expressions, such as re.DOTALL
    """
    if isinstance(reps, dict):
        reps = reps.items()
    pattern = re.compile("|".join("(?P<_%d>%s)" % (i, re_str[0])
                                  for i, re_str in enumerate(reps)),
                         re_flags)
    return pattern.sub(lambda x: reps[int(x.lastgroup[1:])][1], string)

Funziona per gli esempi forniti in altre risposte, ad esempio:

>>> multiple_replace("(condition1) and --condition2--",
...                  {"condition1": "", "condition2": "text"})
'() and --text--'

>>> multiple_replace('hello, world', {'hello' : 'goodbye', 'world' : 'earth'})
'goodbye, earth'

>>> multiple_replace("Do you like cafe? No, I prefer tea.",
...                  {'cafe': 'tea', 'tea': 'cafe', 'like': 'prefer'})
'Do you prefer tea? No, I prefer cafe.'

La cosa principale per me è che puoi usare anche espressioni regolari, ad esempio per sostituire solo parole intere o per normalizzare lo spazio bianco:

>>> s = "I don't want to change this name:\n  Philip II of Spain"
>>> re_str_dict = {r'\bI\b': 'You', r'[\n\t ]+': ' '}
>>> multiple_replace(s, re_str_dict)
"You don't want to change this name: Philip II of Spain"

Se vuoi usare le chiavi del dizionario come stringhe normali, puoi evitarle prima di chiamare multiple_replace usando ad esempio questa funzione:

def escape_keys(d):
    """ transform dictionary d by applying re.escape to the keys """
    return dict((re.escape(k), v) for k, v in d.items())

>>> multiple_replace(s, escape_keys(re_str_dict))
"I don't want to change this name:\n  Philip II of Spain"

La seguente funzione può aiutare a trovare espressioni regolari errate tra le chiavi del dizionario (poiché il messaggio di errore da multiple_replace non è molto indicativo):

def check_re_list(re_list):
    """ Checks if each regular expression in list is well-formed. """
    for i, e in enumerate(re_list):
        try:
            re.compile(e)
        except (TypeError, re.error):
            print("Invalid regular expression string "
                  "at position {}: '{}'".format(i, e))

>>> check_re_list(re_str_dict.keys())

Si noti che non concatena le sostituzioni, ma le esegue contemporaneamente. Questo lo rende più efficiente senza limitare ciò che può fare. Per imitare l'effetto del concatenamento, potrebbe essere necessario aggiungere più coppie di sostituzione delle stringhe e garantire l'ordinamento previsto delle coppie:

>>> multiple_replace("button", {"but": "mut", "mutton": "lamb"})
'mutton'
>>> multiple_replace("button", [("button", "lamb"),
...                             ("but", "mut"), ("mutton", "lamb")])
'lamb'

Question 13

Nota: prova il tuo caso, vedi i commenti.

Ecco un esempio che è più efficiente su stringhe lunghe con molte piccole sostituzioni.

source = "Here is foo, it does moo!"

replacements = {
    'is': 'was', # replace 'is' with 'was'
    'does': 'did',
    '!': '?'
}

def replace(source, replacements):
    finder = re.compile("|".join(re.escape(k) for k in replacements.keys())) # matches every string we want replaced
    result = []
    pos = 0
    while True:
        match = finder.search(source, pos)
        if match:
            # cut off the part up until match
            result.append(source[pos : match.start()])
            # cut off the matched part and replace it in place
            result.append(replacements[source[match.start() : match.end()]])
            pos = match.end()
        else:
            # the rest after the last match
            result.append(source[pos:])
            break
    return "".join(result)

print replace(source, replacements)

Il punto è evitare molte concatenazioni di stringhe lunghe. Tagliamo la stringa sorgente in frammenti, sostituendo alcuni dei frammenti mentre formiamo l'elenco, quindi uniamo nuovamente l'intera cosa in una stringa.

Question 14

Anch'io stavo lottando con questo problema. Con molte sostituzioni, le espressioni regolari fanno fatica e sono circa quattro volte più lente del ciclo string.replace(nelle condizioni del mio esperimento).

Dovresti assolutamente provare a utilizzare la libreria Flashtext ( post del blog qui , Github qui ). Nel mio caso è stato un po 'più di due ordini di grandezza più veloce, da 1,8 sa 0,015 s (le espressioni regolari impiegavano 7,7 s) per ogni documento.

È facile trovare esempi d'uso nei collegamenti sopra, ma questo è un esempio funzionante:

    from flashtext import KeywordProcessor
    self.processor = KeywordProcessor(case_sensitive=False)
    for k, v in self.my_dict.items():
        self.processor.add_keyword(k, v)
    new_string = self.processor.replace_keywords(string)

Notare che Flashtext rende sostituzioni in un unico passaggio (per evitare un -> b e b -> c traslante 'un' in 'c'). Flashtext cerca anche parole intere (quindi "è" non corrisponderà a "th è "). Funziona bene se il tuo obiettivo è composto da più parole (sostituendo "Questo è" con "Ciao").

Question 15

Ritengo che questa domanda abbia bisogno di una risposta ricorsiva alla funzione lambda a riga singola per completezza, solo perché. Quindi ecco:

>>> mrep = lambda s, d: s if not d else mrep(s.replace(*d.popitem()), d)

Utilizzo:

>>> mrep('abcabc', {'a': '1', 'c': '2'})
'1b21b2'

Appunti:

Questo consuma il dizionario di input.
I dict di Python mantengono l'ordine delle chiavi a partire da 3.6; i corrispondenti avvertimenti in altre risposte non sono più rilevanti. Per compatibilità con le versioni precedenti si potrebbe ricorrere a una versione basata su tuple:

>>> mrep = lambda s, d: s if not d else mrep(s.replace(*d.pop()), d)
>>> mrep('abcabc', [('a', '1'), ('c', '2')])

Nota: come con tutte le funzioni ricorsive in python, una profondità di ricorsione troppo grande (cioè dizionari di sostituzione troppo grandi) provocherà un errore. Vedi ad esempio qui .

Question 16

Non dovresti davvero farlo in questo modo, ma trovo che sia troppo bello:

>>> replacements = {'cond1':'text1', 'cond2':'text2'}
>>> cmd = 'answer = s'
>>> for k,v in replacements.iteritems():
>>>     cmd += ".replace(%s, %s)" %(k,v)
>>> exec(cmd)

Ora, answerè il risultato di tutte le sostituzioni a turno

ancora una volta, questo è molto complicato e non è qualcosa che dovresti usare regolarmente. Ma è solo bello sapere che puoi fare qualcosa del genere se ne hai bisogno.

Question 17

Non conosco la velocità, ma questa è la mia soluzione rapida di tutti i giorni:

reduce(lambda a, b: a.replace(*b)
    , [('o','W'), ('t','X')] #iterable of pairs: (oldval, newval)
    , 'tomato' #The string from which to replace values
    )

... ma mi piace la risposta regex n. 1 sopra. Nota: se un nuovo valore è una sottostringa di un altro, l'operazione non è commutativa.

Question 18

È possibile utilizzare la pandaslibreria e la replacefunzione che supporta sia le corrispondenze esatte che le sostituzioni di espressioni regolari. Per esempio:

df = pd.DataFrame({'text': ['Billy is going to visit Rome in November', 'I was born in 10/10/2010', 'I will be there at 20:00']})

to_replace=['Billy','Rome','January|February|March|April|May|June|July|August|September|October|November|December', '\d{2}:\d{2}', '\d{2}/\d{2}/\d{4}']
replace_with=['name','city','month','time', 'date']

print(df.text.replace(to_replace, replace_with, regex=True))

E il testo modificato è:

0    name is going to visit city in month
1                      I was born in date
2                 I will be there at time

Puoi trovare un esempio qui . Notare che le sostituzioni sul testo vengono eseguite nell'ordine in cui appaiono negli elenchi

Question 19

Per sostituire un solo carattere, usa translatee str.maketransè il mio metodo preferito.

tl; dr> result_string = your_string.translate(str.maketrans(dict_mapping))

demo

my_string = 'This is a test string.'
dict_mapping = {'i': 's', 's': 'S'}
result_good = my_string.translate(str.maketrans(dict_mapping))
result_bad = my_string
for x, y in dict_mapping.items():
    result_bad = result_bad.replace(x, y)
print(result_good)  # ThsS sS a teSt Strsng.
print(result_bad)   # ThSS SS a teSt StrSng.

Question 20

Partendo dalla preziosa risposta di Andrew ho sviluppato uno script che carica il dizionario da un file ed elabora tutti i file presenti nella cartella aperta per fare le sostituzioni. Lo script carica le mappature da un file esterno in cui è possibile impostare il separatore. Sono un principiante ma ho trovato questo script molto utile quando si eseguono più sostituzioni in più file. Ha caricato un dizionario con più di 1000 voci in pochi secondi. Non è elegante ma ha funzionato per me

import glob
import re

mapfile = input("Enter map file name with extension eg. codifica.txt: ")
sep = input("Enter map file column separator eg. |: ")
mask = input("Enter search mask with extension eg. 2010*txt for all files to be processed: ")
suff = input("Enter suffix with extension eg. _NEW.txt for newly generated files: ")

rep = {} # creation of empy dictionary

with open(mapfile) as temprep: # loading of definitions in the dictionary using input file, separator is prompted
    for line in temprep:
        (key, val) = line.strip('\n').split(sep)
        rep[key] = val

for filename in glob.iglob(mask): # recursion on all the files with the mask prompted

    with open (filename, "r") as textfile: # load each file in the variable text
        text = textfile.read()

        # start replacement
        #rep = dict((re.escape(k), v) for k, v in rep.items()) commented to enable the use in the mapping of re reserved characters
        pattern = re.compile("|".join(rep.keys()))
        text = pattern.sub(lambda m: rep[m.group(0)], text)

        #write of te output files with the prompted suffice
        target = open(filename[:-4]+"_NEW.txt", "w")
        target.write(text)
        target.close()

Question 21

questa è la mia soluzione al problema. L'ho usato in un chatbot per sostituire le diverse parole contemporaneamente.

def mass_replace(text, dct):
    new_string = ""
    old_string = text
    while len(old_string) > 0:
        s = ""
        sk = ""
        for k in dct.keys():
            if old_string.startswith(k):
                s = dct[k]
                sk = k
        if s:
            new_string+=s
            old_string = old_string[len(sk):]
        else:
            new_string+=old_string[0]
            old_string = old_string[1:]
    return new_string

print mass_replace("The dog hunts the cat", {"dog":"cat", "cat":"dog"})

questo diventerà The cat hunts the dog

Question 22

Un altro esempio: elenco di input

error_list = ['[br]', '[ex]', 'Something']
words = ['how', 'much[ex]', 'is[br]', 'the', 'fish[br]', 'noSomething', 'really']

L'output desiderato sarebbe

words = ['how', 'much', 'is', 'the', 'fish', 'no', 'really']

Codice :

[n[0][0] if len(n[0]) else n[1] for n in [[[w.replace(e,"") for e in error_list if e in w],w] for w in words]]

Question 23

O solo per un trucco veloce:

for line in to_read:
    read_buffer = line              
    stripped_buffer1 = read_buffer.replace("term1", " ")
    stripped_buffer2 = stripped_buffer1.replace("term2", " ")
    write_to_file = to_write.write(stripped_buffer2)

Question 24

Ecco un altro modo per farlo con un dizionario:

listA="The cat jumped over the house".split()
modify = {word:word for number,word in enumerate(listA)}
modify["cat"],modify["jumped"]="dog","walked"
print " ".join(modify[x] for x in listA)