Qual è la lingua più breve?

Crea un programma che trovi le ultime 50 sfide con il codice-golf- tag con almeno 20 risposte. Quindi, estrai i punteggi per ogni lingua in ciascuna delle sfide. Se ci sono più risposte con la stessa lingua, conta tutti i punteggi. Successivamente, prendi le prime 20 lingue più comuni e visualizza un elenco con i nomi delle lingue, il numero di risposte, il numero medio di byte e il numero medio di byte. L'elenco dovrebbe essere ordinato per numero di risposte, in ordine decrescente.

È necessario tenere conto delle variazioni di capitalizzazione (ad esempio: Matlab = MATLAB).

In lingue con molti numeri di versione diversi (ad esempio Python), contali come lingue univoche, quindi: Python != Python 2 != Python 2.7 != Python 3.x

Esempio di output (il formato di output è facoltativo):

cJam,       66,  12.4,  8.5
Pyth,       58,   15.2,  19
Ruby,       44,   19.2,  22.5
Python,     34,   29.3,  32
Python 2.7, 22,   31.2,  40
...
...
Java,       11,   115.5, 94.5

Formati di intestazione che devono essere supportati:

Inizia con # Language name,o#Language name
Finisce con xx bytes, xx Byteso semplicementexx
Ci possono essere molti rifiuti tra la prima virgola e l'ultimo numero.
Se il nome della lingua è un collegamento ( [Name](link)), può essere ignorato

Se la risposta ha un altro formato di intestazione, puoi scegliere di saltarla (o includerla se il tuo codice può gestirla).

Ad esempio, devono essere supportate tutte le seguenti intestazioni:

# Language Name, N bytes
# Ruby, <s>104</s> <s>101</s> 96 bytes 
# Perl, 43 + 2 (-p flag) = 45 Bytes
# MATLAB, 5

Regole:

Va bene usare l'API o solo l'URL del sito web
- Quanto segue può essere estratto dal conteggio dei byte (nient'altro), quindi non è necessario utilizzare un accorciatore di URL (massimo 44 byte):
  - https://(o http://)
  - codegolf
  - .stackexchange.com
  - /questions
Il programma può ricevere input. L'ingresso sarà incluso nel conteggio dei byte.

Oltre a ciò, si applicano le regole standard.

code-golf internet

— Stewie Griffin
fonte

Potrei dirti che è Pyth senza dover affrontare questa sfida.

— Alex A.

il suffisso "byte" è comune, figuriamoci universale, abbastanza da richiederlo?

— Sparr,

@StewieGriffin Penso che Sparr stia dicendo che, sebbene sia comune , non è sempre usato.

— Celeo,

Per quanto posso vedere, xx bytesè molto comune sulle recenti sfide (almeno da quando è stato creato lo snippet della classifica).

— Stewie Griffin,

Di solito uso "caratteri" o "caratteri" invece di "byte"

— Maniglia della porta

Risposte:

R, 821 - 44 = 777 byte

Risultati aggiornati : vedere la cronologia delle modifiche per dare un senso a tutti i commenti qui sotto.

           language num_answers avg_count median_count
1              RUBY          49  49.97959         30.0
2              CJAM          48  32.64583         22.0
3              PYTH          48  21.02083         14.0
4          PYTHON 2          46  86.78261         77.0
5             JULIA          43  58.90698         45.0
6           HASKELL          41  74.65854         56.0
7               PHP          40  73.52500         48.0
8              PERL          36  53.30556         34.0
9          PYTHON 3          34  90.91176         90.5
10       POWERSHELL          33  60.24242         44.0
11                C          32 221.84375         79.5
12                R          32  77.40625         62.5
13             JAVA          29 170.68966        158.0
14 JAVASCRIPT (ES6)          29  90.79310         83.0
15       JAVASCRIPT          28  68.39286         61.0
16               C#          25 193.92000        130.0
17      MATHEMATICA          23  56.04348         47.0
18           MATLAB          22  67.45455         55.0
19         TI-BASIC          19  47.05263         37.0
20              APL          18  16.55556         15.0

Il codice, che ho potuto abbreviare un po 'di più:

W=library;W(XML);W(plyr)
X=xpathSApply;Y=xmlValue;D=data.frame;H=htmlParse;S=sprintf
Z="http://codegolf.stackexchange.com/"
R=function(FUN,...)do.call(rbind,Map(FUN,...))
G=function(url){d=H(url)
a=as.double(sub(".*?(\\d+)a.*","\\1",X(d,"//div[starts-with(@class,'status')]",Y)))
u=paste0(Z,X(d,"//*[contains(@class,'question-hyperlink')]",xmlGetAttr,"href"))
D(u,a)}
u=S("%s/questions/tagged/code-golf?page=%i",Z,1:50)
q=R(G,u)
u=with(q,head(u[a>20],50))
A=function(url){u=S("%s?page=%i",url,1:10)
f=function(u){d=H(u)
h=X(d, "//div[@class='post-text']//h1",Y)
p="^(.*?),.*? (\\d+)( [Bb]ytes)?$"
k=grep(p,h,v=T)
l=toupper(sub(p,"\\1",k))
c=as.double(sub(p,"\\2",k))
D(l,c)}
R(f,u)}
a=R(A,u)
L=names(tail(sort(table(a$l)),20))
x=subset(a,l%in%L)
arrange(ddply(x, "l",summarise,n=length(c),a=mean(c),m=quantile(c,0.5)),-n)

De-giocato a golf:

library(XML)
library(plyr)
LoopBind <- function(FUN, ...) do.call(rbind, Map(FUN, ...))
GetQuestions <- function(url) {
  d = htmlParse(url)
  a=as.double(sub(".*?(\\d+)a.*","\\1",xpathSApply(d, "//div[starts-with(@class, 'status')]", xmlValue)))
  u=paste0("http://codegolf.stackexchange.com/",xpathSApply(d, "//*[contains(@class, 'question-hyperlink')]", xmlGetAttr, "href"))
  data.frame(u, a)
}
u <- sprintf("http://codegolf.stackexchange.com/questions/tagged/code-golf?page=%i", 1:50)
q <- do.call(rbind, Map(GetQuestions, u))
u <- with(q, head(u[a > 20], 50))

GetAnswers <- function(url) {
  u=sprintf("%s?page=%i",url,1:10)
  f=function(u) {
    d = htmlParse(u)
    h = xpathSApply(d, "//div[@class='post-text']//h1", xmlValue)
    p = "^(.*?),.*? (\\d+)( [Bb]ytes)?$"
    k = grep(p,h,v=T)
    l = toupper(sub(p,"\\1",k))
    c = as.double(sub(p,"\\2",k))
    data.frame(language=l,c)
  }
LoopBind(f,u)
}
a=LoopBind(GetAnswers, u)
L=names(tail(sort(table(a$l)),20))
x=subset(a,language%in%L)
arrange(ddply(x, "language", summarise, num_answers = length(c), avg_count = mean(c), median_count = quantile(c,0.5)),
        -num_answers)

— flodel
fonte

Come è la lunghezza media per C # oltre 6000 byte?

— SuperJedi224,

@ SuperJedi224 - Potrebbero esserci degli invii estremamente lunghi che distorcono la media. Ecco perché la mediana è una statistica utile perché è resistente ai valori anomali.

Ho letto da qualche parte che C # è il linguaggio meno golfabile. Ora so perché ...

— ev3commander,

@ ev3commander - C # impallidisce in confronto a Unary ...

— Comintern

@Comintern: Eek ...

— ev3commander

Python 2, 934 - 44 (materiale URL) = 890 byte

Utilizzando l'API:

from urllib2 import urlopen as u
from gzip import GzipFile as f
from StringIO import StringIO as s;x="https://api.stackexchange.com/2.2%s&site=codegolf"
import re;j=u(x%'/search/advanced?pagesize=50&order=desc&sort=creation&answers=20&tagged=code-golf');q=s(j.read());g=f(fileobj=q);true=1;false=0;l=';'.join(str(a['question_id'])for a in eval(g.read())['items']);w=[]
def r(p):
 j=u(x%('/questions/%s/answers?page=%s&filter=!9YdnSMlgz&pagesize=100'%(l,p)));g.seek(0);q.truncate();q.write(j.read());q.seek(0);k=eval(g.read());w.extend(a['body_markdown']for a in k['items'])
 if k['has_more']:r(p+1)
r(1);x={};s=sorted
for m in w:
 try:
  l,n=re.match("(.*?),.*?([0-9]+)[^0-9]*$",m.splitlines()[0]).groups();l=re.subn("# ?","",l,1)[0].upper()
  if l not in x:x[l]=[]
  x[l]+=[(l,int(n))]
 except:pass
for l in s(x,cmp,lambda a:len(x[a]),1)[:20]:
 v=s(x[l])
 print l,len(v),sum(map(lambda a:a[1],v))/len(v),v[len(v)/2][1]

Si noti che questo codice non presta attenzione alla limitazione delle API.

Produzione:

RUBY 60 430 32
PYTH 57 426 16
CJAM 56 35 23
C 52 170 76
PYTHON 2 51 88 79
JULIA 42 63 48
HASKELL 42 81 63
JAVASCRIPT (ES6) 41 96 83
PERL 40 44 27
PYTHON 3 37 91 89
PHP 36 98 59
JAVASCRIPT 36 743 65
POWERSHELL 35 86 44
JAVA 32 188 171
R 30 73 48
MATLAB 25 73 51
MATHEMATICA 24 57 47
APL 22 14 13
SCALA 21 204 59
TI-BASIC 21 42 24

— pppery
fonte

@StewieGriffin È interessante notare che ho dovuto aggiungere una barra in più alla seconda query ricorsiva per qualificarmi per la /questionsriduzione.

— pppery

Le differenze sono perché @flodel nonbytes consente suffissi diversi da , mentre il mio gestirà altri suffissi simili chars.

— pepery

È possibile che il tuo codice combini C, C # e possibilmente C ++? Sembra improbabile che ci siano 73 risposte in C.

— Stewie Griffin,

No, non la penso così. Termino il nome della lingua sulla prima virgola.

— pepery

Sembra che l=re.sub("# ?|,","",l)sia quello che sostituisce C # con C.

— flodel,