Palindromi di Watson-Crick


31

Problema

Crea una funzione in grado di determinare se una stringa di DNA arbitraria è o meno un palindromo di Watson-Crick. La funzione prenderà una stringa di DNA e produrrà un valore vero se la stringa è un palindromo di Watson-Crick e un valore falso se non lo è. (True e False possono anche essere rappresentati come 1 e 0, rispettivamente.)

La stringa di DNA può essere sia maiuscola che minuscola a seconda delle preferenze.

Inoltre, la stringa di DNA non sarà vuota.

Spiegazione

Una stringa di DNA è un palindromo di Watson-Crick quando il complemento del suo rovescio è uguale a se stesso.

Data una stringa di DNA, prima invertila, quindi completa ogni carattere in base alle basi del DNA (A ↔ T e C ↔ G). Se la stringa originale è uguale alla stringa inversa completata, è un palindromo di Watson-Crick.

Per di più, vedi questa domanda . È una sfida diversa in cui è necessario trovare la sottostringa più lunga di una stringa di DNA in cui tale sottostringa è un palindromo di Watson-Crick.

Obbiettivo

Questo è code-golf e vince il codice più corto.

Casi test

Il formato è <input> = <output>.

ATCGCGAT = true
AGT = false
GTGACGTCAC = true
GCAGTGA = false
GCGC = true
AACTGCGTTTAC = false
ACTG = false


3
Qualcuno dovrebbe scrivere un programma in DNA # che è anche un palindromo di Watson-Crick. : D (potrebbe non essere possibile)
mbomb007

Oppure, se vuoi, "una parola è un palindromo di Watson – Crick se ha ordine 2 nel gruppo libero su 2 generatori" (o su n generatori!).
wchargin,

(Suppongo che tecnicamente sia "ordina al massimo 2")
wchargin,

1
@AndrasDeak Secondo il libro di Watsons, Franklin apparentemente era principalmente una spina nel fianco. Si è ripetutamente rifiutata di consegnare radiografie che mostrano l'elica (come ricordo), perché si è rifiutata di crederci. Vale la pena leggere se sei interessato alla scoperta in ogni caso.
Obsidian Phoenix,

Risposte:


27

05AB1E , 10 7 byte

Codice:

Â'š×‡Q

Spiegazione:

Per verificare se una stringa è un palindromo, dobbiamo solo controllare l'input con l'input, con atswap e cgswapped e quindi invertirlo. Quindi è quello che faremo. Spingiamo l'ingresso e l'ingresso invertito usando Â(biforcato). Ora arriva una parte difficile. 'š×è la versione compressa per creating. Se lo invertiamo, puoi vedere perché è nel codice:

CreATinG
|  ||  |
GniTAerC

Questo verrà utilizzato per traslitterare l'ingresso invertito. La traslitterazione viene eseguita con . Successivamente, controlliamo solo se l'input e l'input traslitterato sono Qeffettivi e stampiamo quel valore. Ecco come appare lo stack per l'input actg:

          # ["actg", "gtca"]
 'š×       # ["actg", "gtca", "creating"]
    Â      # ["actg", "gtca", "creating", "gnitaerc"]
     ‡     # ["actg", "cagt"]
      Q    # [0]

Che può anche essere visto con il flag di debug ( Provalo qui ).

Utilizza la codifica CP-1252 . Provalo online! .


4
Molto, ehm, creativo ...
Toby Speight,

2
Questa lingua ha alcune caratteristiche molto precise
miglia

18

Gelatina , 9 byte

O%8µ+U5ḍP

Provalo online! o verifica tutti i casi di test .

Come funziona

O%8µ+U5ḍP  Main link. Argument: S (string)

O          Compute the code points of all characters.
 %8        Compute the residues of division by 8.
           This maps 'ACGT' to [1, 3, 7, 4].
   µ       Begin a new, monadic link. Argument: A (array of residues)
    +U     Add A and A reversed.
      5ḍ   Test the sums for divisibility by 5.
           Of the sums of all pairs of integers in [1, 3, 7, 4], only 1 + 4 = 5
           and 3 + 7 = 10 are divisible by 5, thus identifying the proper pairings.
        P  Take the product of the resulting Booleans.

4
Penso che Python sia abbastanza vicino a competere con questa risposta! Confrontare i primi nove byte della mia risposta: lambda s:. Questa è quasi la soluzione completa!
orlp,

Aspetta, la parte "Come funziona" non spiega davvero come funziona ... Perché residui di 8 e somme di 5 ?? Dove sono integrate le lettere?
ZeroOne,

@ZeroOne Ho chiarito quella parte.
Dennis,

Oh wow! È maledettamente intelligente. :) Grazie!
ZeroOne,

12

Python 2, 56 45 44 byte

lambda s:s==s[::-1].translate("_T_GA__C"*32)

lambda s:s==s[::-1].translate("TCG_A"*99)funziona in Python 3
Alex Varga il

8

Perl, 27 byte

Include +2 per -lp

Fornisci input su STDIN, stampa 1 o niente:

dnapalin.pl <<< ATCGCGAT

dnapalin.pl:

#!/usr/bin/perl -lp
$_=y/ATCG/TAGC/r=~reverse

Sostituisci $_=con $_+=per ottenere 0invece che vuoto per il caso falso



7

Retina , 34 33 byte

$
;$_
T`ACGT`Ro`;.+
+`(.);\1
;
^;

Provalo online!(Leggermente modificato per eseguire tutti i casi di test contemporaneamente.)

Spiegazione

$
;$_

Duplica l'input facendo corrispondere l'estremità della stringa e inserendo un ;seguito dall'intero input.

T`ACGT`Ro`;.+

Abbina solo la seconda metà dell'input con ;.+ed esegui la sostituzione di coppie con una traslitterazione. Per quanto riguarda il set target Ro: fa oriferimento all'altro set, che oviene sostituito con ACGT. Ma Rinverte questo set, quindi i due set sono in realtà:

ACGT
TGCA

Se l'input è un palindromo di DNA, ora avremo l'input seguito dal suo rovescio (separato da ;).

+`(.);\1
;

Ripetutamente ( +) rimuovi una coppia di caratteri identici attorno a ;. Questo continuerà fino a quando rimarrà solo il ;o i due caratteri attorno a; non saranno più identici, il che significherebbe che le stringhe non sono inverse l'una rispetto all'altra.

^;

Controlla se il primo carattere è ;e stampa 0o di 1conseguenza.


6

JavaScript (ES6), 59 byte

f=s=>!s||/^(A.*T|C.*G|G.*C|T.*A)$/.test(s)&f(s.slice(1,-1))

Il meglio che potevo fare senza usare Regexp era di 62 byte:

f=s=>!s||parseInt(s[0]+s.slice(-1),33)%32%7<1&f(s.slice(1,-1))

5

Ruby, 35 anni

Ho provato altri modi, ma il modo ovvio era il più breve:

->s{s.tr('ACGT','TGCA').reverse==s}

nel programma di test

f=->s{s.tr('ACGT','TGCA').reverse==s}

puts f['ATCGCGAT']
puts f['AGT']
puts f['GTGACGTCAC']
puts f['GCAGTGA']
puts f['GCGC']
puts f['AACTGCGTTTAC'] 

2
->s{s.==s.reverse.tr'ACGT','TGCA'}è un byte più corto
Mitch Schwartz,

@MitchSchwartz Eeeek !!!, funziona, ma non ho idea di cosa .sia il primo . Il codice mi sembra più giusto senza di esso, ma è necessario per farlo funzionare. È documentato ovunque?
Level River St

Sei sicuro di non voler capirlo da solo?
Mitch Schwartz,

@MitchSchwartz hahaha ho già provato. Trovo i requisiti di Ruby per gli spazi bianchi molto idiosincratici. Strani requisiti per i periodi sono un altro problema. Ho diverse teorie ma tutte potrebbero essere sbagliate. Ho il sospetto che possa avere qualcosa a che fare con il trattamento ==come metodo piuttosto che come operatore, ma la ricerca per simboli è impossibile.
Level River St

Sospettavi correttamente. :) È solo una semplice chiamata del vecchio metodo.
Mitch Schwartz,

5

Haskell, 48 45 byte

(==)=<<reverse.map((cycle"TCG_A"!!).fromEnum)

Esempio di utilizzo: (==)=<<reverse.map((cycle"_T_GA__C"!!).fromEnum) $ "ATCGCGAT"-> True.

Una versione non pointfree è

f x = reverse (map h x) == x           -- map h to x, reverse and compare to x
h c = cycle "TCG_A" !! fromEnum c      -- take the ascii-value of c and take the
                                       -- char at this position of string
                                       -- "TCG_ATCG_ATCG_ATCG_A..."

Modifica: @Mathias Dolidon ha salvato 3 byte. Grazie!


Funziona cycle "TCG_A" anche con . :)
Mathias Dolidon,


4

Julia, 47 38 byte

s->((x=map(Int,s)%8)+reverse(x))%50

This is an anonymous function that accepts a Char array and returns a boolean. To call it, assign it to a variable.

This uses Dennis' algorithm, which is shorter than the naïve solution. We get the remainder of each code point divided by 8, add that to itself reversed, get the remainders from division by 5, and check whether all are 0. The last step is accomplished using , the infix version of issubset, which casts both arguments to Set before checking. This means that [0,0,0] is declared a subset of 0, since Set([0,0,0]) == Set(0). This is shorter than an explicit check against 0.

Try it online!

Saved 9 bytes thanks to Dennis!


4

Jolf, 15 Bytes

Try it!

=~A_iγ"AGCT"_γi

Explanation:

   _i            Reverse the input
 ~A_iγ"AGCT"_γ   DNA swap the reversed input
=~A_iγ"AGCT"_γi  Check if the new string is the same as the original input

3

Jolf, 16 bytes

Try it here!

pe+i~Aiγ"GATC"_γ

Explanation

pe+i~Aiγ"GATC"_γ
    ~Aiγ"GATC"_γ  perform DNA transformation
  +i              i + (^)
pe                is a palindrome

3

Actually, 19 bytes

O`8@%`M;RZ`5@Σ%Y`Mπ

This uses Dennis's algorithm.

Try it online!

Explanation:

O`8@%`M;RZ`5@Σ%Y`Mπ
O                    push an array containing the Unicode code points of the input
 `8@%`M              modulo each code point by 8
       ;RZ           zip with reverse
          `5@Σ%Y`M   test sum for divisibility by 5
                  π  product

3

Oracle SQL 11.2, 68 bytes

SELECT DECODE(TRANSLATE(REVERSE(:1),'ATCG','TAGC'),:1,1,0)FROM DUAL; 

2
With SQL like that, I'm confident you must have written reports for some of my projects before...
corsiKa

3

Julia 0.4, 22 bytes

s->s$reverse(s)⊆""

The string contains the control characters EOT (4) and NAK (21). Input must be in form of a character array.

This approach XORs the characters of the input with the corresponding characters in the reversed input. For valid pairings, this results in the characters EOT or NAK. Testing for inclusion in the string of those characters produces the desired Boolean.

Try it online!


3

C,71

r,e;f(char*s){for(r=0,e=strlen(s)+1;*s;s++)r|=*s*s[e-=2]%5^2;return!r;}

2 bytes saved by Dennis. Additional 2 bytes saved by adapting for lowercase input: constants 37 and 21 are revised to 5 and 2.

C,75

i,j;f(char*s){for(i=j=0;s[i];i++)j|=s[i]*s[strlen(s)-i-1]%37!=21;return!j;}

Saved one byte: Eliminated parenthesis by taking the product of the two ASCII codes mod 37. The valid pairs evaluate to 21. Assumes uppercase input.

C,76

i,j;f(char*s){for(i=j=0;s[i];i++)j|=(s[i]+s[strlen(s)-i-1])%11!=6;return!j;}

Uses the fact that ASCII codes of the valid pairs sum to 138 or 149. When taken mod 11, these are the only pairs that sum to 6. Assumes uppercase input.

ungolfed in test program

i,j;

f(char *s){
   for(i=j=0;s[i];i++)                  //initialize i and j to 0; iterate i through the string
     j|=(s[i]+s[strlen(s)-i-1])%11!=6;  //add characters at i from each end of string, take result mod 11. If not 6, set j to 1
return!j;}                              //return not j (true if mismatch NOT detected.)

main(){
  printf("%d\n", f("ATCGCGAT"));
  printf("%d\n", f("AGT"));
  printf("%d\n", f("GTGACGTCAC"));
  printf("%d\n", f("GCAGTGA"));
  printf("%d\n", f("GCGC"));
  printf("%d\n", f("AACTGCGTTTAC"));
} 

1
r,e;f(char*s){for(r=0,e=strlen(s)+1;*s;s++)r|=*s*s[e-=2]%37^21;return!r;} saves a couple of bytes.
Dennis

@Dennis thanks, I really wasn't in the mood for modifying pointers, but it squeezed a byte out! I should have seen != > ^ myself. I reduced another 2 by changing to lowercase input: both magic numbers are now single digit.
Level River St

3

Factor, 72 bytes

Unfortunately regex can't help me here.

[ dup reverse [ { { 67 71 } { 65 84 } { 71 67 } { 84 65 } } at ] map = ]

Reverse, lookup table, compare equal.


Wow, that's a lot of whitespace!!! Is it all necessary? Also, a link to the language homepage would be useful.
Level River St

@LevelRiverSt Unfortunately, every bit of it is necessary. I'll add a link to the header.
cat

3

Bash + coreutils, 43 32 bytes

[ `tr ATCG TAGC<<<$1|rev` = $1 ]

Tests:

for i in ATCGCGAT AGT GTGACGTCAC GCAGTGA GCGC AACTGCGTTTAC; do ./78410.sh $i && echo $i = true || echo $i = false; done
ATCGCGAT = true
AGT = false
GTGACGTCAC = true
GCAGTGA = false
GCGC = true
AACTGCGTTTAC = false

3

J - 21 bytes

0=[:+/5|[:(+|.)8|3&u:

Based on Dennis' method

Usage

   f =: 0=[:+/5|[:(+|.)8|3&u:
   f 'ATCGCGAT'
1
   f 'AGT'
0
   f 'GTGACGTCAC'
1
   f 'GCAGTGA'
0
   f 'GCGC'
1
   f 'AACTGCGTTTAC'
0
   f 'ACTG'
0

Explanation

0=[:+/5|[:(+|.)8|3&u:
                 3&u:    - Convert from char to int
               8|        - Residues from division by 8 for each
            |.           - Reverse the list
           +             - Add from the list and its reverse element-wise
        [:               - Cap, compose function
      5|                 - Residues from division by 5 for each
    +/                   - Fold right using addition to create a sum
  [:                     - Cap, compose function
0=                       - Test the sum for equality to zero

3

Labyrinth, 42 bytes

_8
,%
;
"}{{+_5
"=    %_!
 = """{
 ;"{" )!

Terminates with a division-by-zero error (error message on STDERR).

Try it online!

The layout feels really inefficient but I'm just not seeing a way to golf it right now.

Explanation

This solution is based on Dennis's arithmetic trick: take all character codes modulo 8, add a pair from both ends and make sure it's divisible by 5.

Labyrinth primer:

  • Labyrinth has two stacks of arbitrary-precision integers, main and aux(iliary), which are initially filled with an (implicit) infinite amount of zeros.
  • The source code resembles a maze, where the instruction pointer (IP) follows corridors when it can (even around corners). The code starts at the first valid character in reading order, i.e. in the top left corner in this case. When the IP comes to any form of junction (i.e. several adjacent cells in addition to the one it came from), it will pick a direction based on the top of the main stack. The basic rules are: turn left when negative, keep going ahead when zero, turn right when positive. And when one of these is not possible because there's a wall, then the IP will take the opposite direction. The IP also turns around when hitting dead ends.
  • Digits are processed by multiplying the top of the main stack by 10 and then adding the digit.

The code starts with a small 2x2, clockwise loop, which reads all input modulo 8:

_   Push a 0.
8   Turn into 8.
%   Modulo. The last three commands do nothing on the first iteration
    and will take the last character code modulo 8 on further iterations.
,   Read a character from STDIN or -1 at EOF. At EOF we will leave loop.

Now ; discards the -1. We enter another clockwise loop which moves the top of the main stack (i.e. the last character) down to the bottom:

"   No-op, does nothing.
}   Move top of the stack over to aux. If it was at the bottom of the stack
    this will expose a zero underneath and we leave the loop.
=   Swap top of main with top of aux. The effect of the last two commands
    together is to move the second-to-top stack element from main to aux.
"   No-op.

Now there's a short linear bit:

{{  Pull two characters from aux to main, i.e. the first and last (remaining)
    characters of the input (mod 8).
+   Add them.
_5  Push 5.
%   Modulo.

The IP is now at a junction which acts as a branch to test divisibility by 5. If the result of the modulo is non-zero, we know that the input is not a Watson-Crick palindrome and we turn east:

_   Push 0.
!   Print it. The IP hits a dead end and turns around.
_   Push 0.
%   Try to take modulo, but division by zero fails and the program terminates.

Otherwise, we need to keep checking the rest of the input, so the IP keeps going south. The { pulls over the bottom of the remaining input. If we've exhausted the input, then this will be a 0 (from the bottom of aux), and the IP continues moving south:

)   Increment 0 to 1.
!   Print it. The IP hits a dead end and turns around.
)   Increment 0 to 1.
{   Pull a zero over from aux, IP keeps moving north.
%   Try to take modulo, but division by zero fails and the program terminates.

Otherwise, there are more characters in the string to be checked. The IP turns west and moves into the next (clockwise) 2x2 loop which consists largely of no-ops:

"   No-op.
"   No-op.
{   Pull one value over from aux. If it's the bottom of aux, this will be
    zero and the IP will leave the loop eastward.
"   No-op.

After this loop, we've got the input on the main stack again, except for its first and last character and with a zero on top. The ; discards the 0 and then = swaps the tops of the stacks, but this is just to cancel the first = in the loop, because we're now entering the loop in a different location. Rinse and repeat.


3

sed, 67 61 bytes

G;H;:1;s/\(.\)\(.*\n\)/\2\1/;t1;y/ACGT/TGCA/;G;s/^\(.*\)\1$/1/;t;c0

(67 bytes)

Test

for line in ATCGCGAT AGT GTGACGTCAC GCAGTGA GCGC AACTGCGTTTAC ACTG
do echo -n "$line "
    sed 'G;H;:1;s/\(.\)\(.*\n\)/\2\1/;t1;y/ACGT/TGCA/;G;s/^\(.*\)\1$/1/;t;c0' <<<"$line"
done

Output

ATCGCGAT 1
AGT 0
GTGACGTCAC 1
GCAGTGA 0
GCGC 1
AACTGCGTTTAC 0
ACTG 0

By using extended regular expressions, the byte count can be reduced to 61.

sed -r 'G;H;:1;s/(.)(.*\n)/\2\1/;t1;y/ACGT/TGCA/;G;s/^(.*)\1$/1/;t;c0'

If you can do it in 61 bytes, then that's your score -- there's nothing against NFA or turing-complete regexp on this particular challenge. Some challenges disallow regex in full, but usually only regex-golf will disallow non regular-expressions.
cat

3

C#, 65 bytes

bool F(string s)=>s.SequenceEqual(s.Reverse().Select(x=>"GACT"[("GACT".IndexOf(x)+2)%4]));

.NET has some fairly long framework method names at times, which doesn't necessarily make for the best code golf framework. In this case, framework method names make up 33 characters out of 90. :)

Based on the modulus trick from elsewhere in the thread:

bool F(string s)=>s.Zip(s.Reverse(),(a,b)=>a%8+b%8).All(x=>x%5==0);

Now weighs in at 67 characters whereof 13 are method names.

Another minor optimization to shave off a whopping 2 characters:

bool F(string s)=>s.Zip(s.Reverse(),(a,b)=>(a%8+b%8)%5).Sum()<1;

So, 65 of which 13 are framework names.

Edit: Omitting some of the limited "boilerplate" from the solution and adding a couple of conditions leaves us with the expression

s.Zip(s.Reverse(),(a,b)=>(a%8+b%8)%5).Sum()

Which gives 0 if and only if the string s is a valid answer. As cat points out, "bool F(string s)=>" is actually replacable with "s=>" if it's otherwise clear in the code that the expression is a Func<string,bool>, ie. maps a string to a boolean.


1
Welcome to PPCG, nice first answer! :D
cat

@cat Thanks for that! :)
robhol

1
I don't really know C#, but if this is a lambda, then you can leave out its type and assigning it, as anonymous functions are fine as long as they are assignable.
cat

1
Also, can't you do !s.Zip... instead of s.Zip...==0? (Or can't you ! ints in C#?) Even if you can't boolean-negate it, you can leave out any sort of inversion and state in your answer that this returns <this thing> for falsy and <this other deterministic, clearly discernable thing> for truthy.
cat

1
@cat: You're right about dropping the type. I thought the code had to be directly executable, but making simple assumptions about input and output makes it a bit easier. The other thing won't work, however - rightly so, in my opinion, since a boolean operation has no logical (hue hue) way to apply to a number. Assigning 0 and 1 the values of false and true is, after all, just convention.
robhol

2

REXX 37

s='ATCGCGAT';say s=translate(reverse(s),'ATCG','TAGC')

2

R, 101 bytes

g=function(x){y=unlist(strsplit(x,""));all(sapply(rev(y),switch,"C"="G","G"="C","A"="T","T"="A")==y)}

Test Cases

g("ATCGCGAT")
[1] TRUE
g("AGT")
[1] FALSE
g("GTGACGTCAC")
[1] TRUE
g("GCAGTGA")
[1] FALSE
g("GCGC")
[1] TRUE
g("AACTGCGTTTAC")
[1] FALSE
g("ACTG")
[1] FALSE

strsplit(x,"")[[1]] is 3 bytes shorter than unlist(strsplit(x,"")) and, here, is equivalent since x is always a single string of character.
plannapus

2

Octave, 52 bytes

f=@(s) prod(mod((i=mod(toascii(s),8))+flip(i),5)==0)

Following Denis's trick ... take the ASCII values mod 8, flip and add together; if every sum is a multiple of five, you're golden.


That one whitespace is significant? That's... odd.
cat

Also, you can leave out the f= assignment; unnamed functions are okay.
cat

1

Clojure/ClojureScript, 49 chars

#(=(list* %)(map(zipmap"ATCG""TAGC")(reverse %)))

Works on strings. If the requirements are loosened to allow lists, I can take off the (list* ) and save 7 chars.


1

R, 70 bytes

f=function(x)all(chartr("GCTA","CGAT",y<-strsplit(x,"")[[1]])==rev(y))

Usage:

> f=function(x)all(chartr("GCTA","CGAT",y<-strsplit(x,"")[[1]])==rev(y))
> f("GTGACGTCAC")
[1] TRUE
> f("AACTGCGTTTAC")
[1] FALSE
> f("AGT")
[1] FALSE
> f("ATCGCGAT")
[1] TRUE

1

C, 71 bytes

Requires ASCII codes for the relevant characters, but accepts uppercase, lowercase or mixed-case input.

f(char*s){char*p=s+strlen(s),b=0;for(;*s;b&=6)b|=*--p^*s++^4;return!b;}

This code maintains two pointers, s and p, traversing the string in opposite directions. At each step, we compare the corresponding characters, setting b true if they don't match. The matching is based on XOR of the character values:

'A' ^ 'T' = 10101
'C' ^ 'G' = 00100

'C' ^ 'T' = 10111
'G' ^ 'A' = 00110
'A' ^ 'C' = 00010
'T' ^ 'G' = 10011
 x  ^  x  = 00000

We can see in the above table that we want to record success for xx10x and failure for anything else, so we XOR with 00100 (four) and mask with 00110 (six) to get zero for AT or CG and non-zero otherwise. Finally, we return true if all the pairs accumulated a zero result in b, false otherwise.

Test program:

#include <stdio.h>
int main(int argc, char **argv)
{
    while (*++argv)
        printf("%s = %s\n", *argv, f(*argv)?"true":"false");
}

Utilizzando il nostro sito, riconosci di aver letto e compreso le nostre Informativa sui cookie e Informativa sulla privacy.
Licensed under cc by-sa 3.0 with attribution required.