Stampa parola contenente stringa e prima parola

10

Voglio trovare una stringa in una riga di testo e stampare la stringa (tra gli spazi) e la prima parola della frase.

Per esempio:

"Questa è una singola riga di testo"
"Un'altra cosa"
"È meglio riprovare"
"Meglio"

L'elenco delle stringhe è:

testo
cosa
provare
Meglio

Quello che sto cercando è ottenere una tabella come questa:

Questo [tab] testo
Un'altra cosa [tab]
[Scheda] prova
Meglio

Ho provato con grep ma non è successo nulla. Qualche suggerimento?

command-line text-processing regex

— Felipe Lira
fonte

Quindi, fondamentalmente "Se la riga ha una stringa, stampa la prima parola + stringa". Destra ?

— Sergiy Kolodyazhnyy,

12

Versione Bash / grep:

#!/bin/bash
# string-and-first-word.sh
# Finds a string and the first word of the line that contains that string.

text_file="$1"
shift

for string; do
    # Find string in file. Process output one line at a time.
    grep "$string" "$text_file" | 
        while read -r line
    do
        # Get the first word of the line.
        first_word="${line%% *}"
        # Remove special characters from the first word.
        first_word="${first_word//[^[:alnum:]]/}"

        # If the first word is the same as the string, don't print it twice.
        if [[ "$string" != "$first_word" ]]; then
            echo -ne "$first_word\t"
        fi

        echo "$string"
    done
done

Chiamalo così:

./string-and-first-word.sh /path/to/file text thing try Better

Produzione:

This    text
Another thing
It  try
Better

— wjandrea
fonte

9

Perl in soccorso!

#!/usr/bin/perl
use warnings;
use strict;

my $file = shift;
my $regex = join '|', map quotemeta, @ARGV;
$regex = qr/\b($regex)\b/;

open my $IN, '<', $file or die "$file: $!";
while (<$IN>) {
    if (my ($match) = /$regex/) {
        print my ($first) = /^\S+/g;
        if ($match ne $first) {
            print "\t$match";
        }
        print "\n";
    }
}

Salva come first-plus-word, esegui come

perl first-plus-word file.txt text thing try Better

Crea una regex dalle parole di input. Ogni riga viene quindi confrontata con la regex e, se esiste una corrispondenza, viene stampata la prima parola e, se è diversa dalla parola, viene stampata anche la parola.

— choroba
fonte

9

Ecco una versione awk:

awk '
  NR==FNR {a[$0]++; next;} 
  {
    gsub(/"/,"",$0);
    for (i=1; i<=NF; i++)
      if ($i in a) printf "%s\n", i==1? $i : $1"\t"$i;
  }
  ' file2 file1

dove si file2trova l'elenco delle parole e file1contiene le frasi.

— steeldriver
fonte

2

Buona! L'ho inserito in un file di script, paste.ubuntu.com/23063130 , solo per comodità

— Sergiy Kolodyazhnyy,

8

Ecco la versione di Python:

#!/usr/bin/env python
from __future__ import print_function 
import sys

# List of strings that you want
# to search in the file. Change it
# as you fit necessary. Remember commas
strings = [
          'text', 'thing',
          'try', 'Better'
          ]


with open(sys.argv[1]) as input_file:
    for line in input_file:
        for string in strings:
            if string in line:
               words = line.strip().split()
               print(words[0],end="")
               if len(words) > 1:
                   print("\t",string)
               else:
                   print("")

demo:

$> cat input_file.txt                                                          
This is a single text line
Another thing
It is better you try again
Better
$> python ./initial_word.py input_file.txt                                      
This    text
Another     thing
It  try
Better

Nota a margine : lo script è python3compatibile, quindi puoi eseguirlo con python2o python3.

— Sergiy Kolodyazhnyy
fonte

7

Prova questo:

$ sed -En 's/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/p' File
This    text
Another thing
It      try
        Better

Se la scheda prima di Betterè un problema, prova questo:

$ sed -En 's/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/; ta; b; :a; s/^\t//; p' File
This    text
Another thing
It      try
Better

Quanto sopra è stato testato su GNU sed (chiamato gsedsu OSX). Per BSD sed, potrebbero essere necessari alcuni piccoli cambiamenti.

Come funziona

s/(([[:alnum:]]+)[[:space:]].*)?(text|thing|try|Better).*/\2\t\3/

Questo cerca una parola, [[:alnum:]]+seguita da uno spazio, [[:space:]]seguita da qualsiasi cosa .*, seguita da una delle tue parole text|thing|try|Better, seguita da qualsiasi cosa. Se viene trovato, viene sostituito con la prima parola sulla riga (se presente), una scheda e la parola corrispondente.
ta; b; :a; s/^\t//; p

Se il comando di sostituzione ha comportato una sostituzione, il che significa che una delle tue parole è stata trovata sulla riga, allora il tacomando dice a sed di saltare all'etichetta a. In caso contrario, allora si ramifica ( b) alla riga successiva. :adefinisce l'etichetta a. Quindi, se viene trovata una delle tue parole, noi (a) facciamo la sostituzione s/^\t//che rimuove una scheda iniziale se ce n'è una e (b) stampa ( p) la linea.

— John1024
fonte

7

Un semplice approccio bash / sed:

$ while read w; do sed -nE "s/\"(\S*).*$w.*/\1\t$w/p" file; done < words 
This    text
Another thing
It  try
    Better

L' while read w; do ...; done < wordsitererà su ogni riga del file wordse lo salverà come $w. Le -nmarche sednon stampa nulla per impostazione predefinita. Il sedcomando sostituirà quindi le doppie virgolette seguite da spazi non bianchi ( \"(\S*), le parentesi servono per "catturare" ciò che corrisponde \S*, la prima parola, e in seguito possiamo fare riferimento ad esso come \1), 0 o più caratteri ( .*) e quindi il parola stiamo cercando ( $w) e ancora 0 o più caratteri ( .*). Se queste partite, sostituiamo con solo la prima parola, una scheda e $w( \1\t$w), e il risultato della stampa della linea (che è ciò che il pin s///pfa).

— terdon
fonte

5

Questa è la versione di Ruby

str_list = ['text', 'thing', 'try', 'Better']

File.open(ARGV[0]) do |f|
  lines = f.readlines
  lines.each_with_index do |l, idx|
    if l.match(str_list[idx])
      l = l.split(' ')
      if l.length == 1
        puts l[0]
      else
        puts l[0] + "\t" + str_list[idx]
      end
    end
  end
end

Il file di testo di esempio hello.txtcontiene

This is a single text line
Another thing
It is better you try again
Better

In esecuzione con ruby source.rb hello.txtrisultati in

This    text
Another thing
It      try
Better

— Anwar
fonte