Converti in camelCase


34

La sfida

L' altro giorno stavo leggendo la Guida di stile Java di Google e mi sono imbattuto nel loro algoritmo per convertire qualsiasi stringa arbitraria in notazione camelCase. In questa sfida devi implementare questo algoritmo poiché non vuoi fare tutto questo nella tua testa quando scrivi le tue presentazioni Java super competitive alle sfide del code-golf.

Nota: ho apportato alcune piccole modifiche all'algoritmo. È necessario utilizzare quello specificato di seguito.

L'algoritmo

Si inizia con una stringa di input arbitraria e si applicano le seguenti operazioni:

  1. Rimuovi tutti gli apostrofi `'
  2. Dividi il risultato in parole suddividendolo in
    • caratteri non alfanumerici e non una cifra [^a-zA-Z0-9]
    • Lettere maiuscole che sono circondate da lettere minuscole su entrambi i lati. abcDefGhI jkper esempio reseabc Def Ghi jk
  3. Minuscole ogni parola.
  4. Maiuscolo il primo carattere di ogni tranne la prima parola.
  5. Unisci di nuovo tutte le parole.

Note aggiuntive

  • L'input conterrà solo ASCII stampabile.
  • Se una cifra è la prima lettera di una parola, lasciala così com'è e non capalizzare qualcos'altro in questa parola.
  • L'input avrà sempre almeno un carattere.

Regole

Casi test

"Programmazione di puzzle e codice golf" -> "ProgrammingPuzzlesCodeGolf"
"Richiesta HTTP XML" -> "xmlHttpRequest"
"supporta IPv6 su iOS?" -> "supportsIpv6OnIos"
"SomeThing w1th, apo'strophe's 'e punc] tuation" -> "someThingW1thApostrophesAndPuncTuation"
"niente di speciale" -> "niente speciale"
"5pecial ca5e" -> "5pecialCa5e"
"1337" -> "1337"
"1337-spEAk" -> "1337Speak"
"whatA pasticcio" -> "whataMess"
"abcD" -> "abcd"
"a" -> "a"
"B" -> "b"

Buona programmazione!


3
Interessante, non ho mai saputo che si chiamasse "camelCase". Il nome è appropriato suppongo ...
Ashwin Gupta

4
Ci sono altri: snake_case&PascalCase
Martijn

14
@Martijn a snake_casecausa di Python, ovviamente. FORTH ha anche FORTHCASEe APL haunreadable in any case
cat

Il test case 4 dovrebbe avere ApostropheSnell'output.
Tito

@Titus No, it is correct. Apostrophes are removed before the input is splitted.
Denker

Risposte:


13

Retina, 56 bytes

Byte count assumes ISO 8859-1 encoding.

T`'\`
S_`\W|_|(?<=[a-z])(?=[A-Z][a-z])
T`L`l
T`l`L`¶.
¶

Try it online!

Explanation

This implements the specification quite literally:

T`'\`

Remove apostrophes and backticks.

S_`\W|_|(?<=[a-z])(?=[A-Z][a-z])

Split the string around non-word characters (in regex this also excludes digits and underscores), or underscores or positions that have a lower case letter on the left and and upper case, lower case on the right. This would create some empty segments when there are two non-letter, non-digit characters in a row, or more important at the beginning of the string. We get rid of those with the _ option. Here, "splitting" means put each remaining part on its own line.

T`L`l

Convert everything to lower case.

T`l`L`¶.

Convert each character that occurs after the linefeed to upper case. This will conveniently skip the first word because there's no linefeed in front of it.

Get rid of the linefeeds to join everything back together.


You beat me to it. Nice one!
mbomb007

This question may be a bit weird, but...should I post my answer if it's shorter than yours and also in Retina? I was working on it before your answer appeared, but then it did and now I don't know if I should post it.
daavko

5
@daavko Sure, post it (I usually decide based on how different the approach is to the existing answer... if it's the exact same thing with a byte shaved off somewhere I normally just comment on that answer, but if it's a lot shorter of a different approach, then I'd just post a separate answer).
Martin Ender

2
@daavko The lookaround is necessary though. Note that your answer doesn't retain the capitalisation of Thing although it should.
Martin Ender

1
@MartinBüttner Oh...I didn't notice that. Oh well, I'll successfully answer some other challenge, then.
daavko

11

Java, 198 190 bytes

+3 bytes because I forgot that \W+ == [^a-zA-Z0-9_]+ and I need to match [^a-zA-Z0-9]+

-11 bytes thanks to user20093 - ?: instead of if/else

Because, Java.

s->{String[]a=s.replaceAll("`|'","").split("[\\W_]+|(?<=[a-z])(?=[A-Z][a-z])");s="";for(String w:a){String t=w.toLowerCase();s+=a[0]==w?t:t.toUpperCase().charAt(0)+t.substring(1);}return s;}

This is a lambda. Call like so:

UnaryOperator<String> op = s->{String[]a=s.replaceAll("`|'","").split("[\\W_]+|(?<=[a-z])(?=[A-Z][a-z])");s="";for(String w:a){String t=w.toLowerCase();s+=a[0]==w?t:t.toUpperCase().charAt(0)+t.substring(1);}return s;};
System.out.println(op.apply("Programming Puzzles & Code Golf"));

Readable version:

public static String toCamelCase(String s) {
    String[] tokens = s
            .replaceAll("`|'", "") // 1. Remove all apostrophes
            .split("[\\W_]+|(?<=[a-z])(?=[A-Z][a-z])"); // 2. Split on [\W_]+ or between [a-z] and [A-Z][a-z]
    s = ""; // Reusing s for building output is cheap
    for (String token : tokens) {
        String lowercaseToken = token.toLowerCase(); // 3. Lowercase every word
        s += tokens[0].equals(token)?lowercaseToken:lowercaseToken.toUpperCase().charAt(0) + lowercaseToken.substring(1); // 4. Uppercase first char of all but first word
        // ^ 5. Join all words back together
    }
    return s;
}

1
It's not Swift...
CalculatorFeline

2
Welcome to Programming Puzzles & Code Golf! This is a nice first answer!
Alex A.

1
@CatsAreFluffy What?
cat

if you replace conditional statement(if/else) with conditional expression (?:) you could save around 9 bytes
user902383

Don't know how I missed that @user902383 - added for -11 bytes. Unfortunately I had to add 3 as well to match _ as a token delimiter.
CAD97

10

JavaScript (ES6), 156 154 152 148 145 141 140 bytes

Thanks @Neil (6 bytes), @ETHproductions (3 bytes), and @edc65 (7 bytes)

a=>a[r='replace'](/`|'/g,a='')[r](/[a-z](?=[A-Z][a-z])/g,'$& ')[r](/[^\W_]+/g,b=>a+=(a?b[0].toUpperCase():'')+b.slice(!!a).toLowerCase())&&a

Removes apostrophes, then does a replace to split on special characters/before surrounded capitals, then combines with proper casing. Unfortunately, toLowerCase() and toUpperCase() are annoyingly long and hard to avoid here...


1
I was working on a different approach which your b.slice(i>0) approach blows out of the water, but in the mean time my match regex of /[A-Z]?([a-z0-9]|[0-9A-Z]{2,})+([A-Z](?![a-z]))?/g does appear to save 2 bytes over your otherwise ingenious replace approach.
Neil

1
Or I could just save 2 bytes on your replace directly: replace(/[a-z](?=[A-Z][a-z])/g,'$& ')
Neil

1
Usually match...map can be replaced with replace
edc65

1
@edc65 I get a minimum of 160 bytes with that approach: a=>a.replace(/`|'/g,'').replace(/[a-z](?=[A-Z][a-z])/g,'$& ').replace(/[\W_]*([a-z0-9]+)[\W_]*/gi,(_,b,i)=>(i?b[0].toUpperCase():'')+b.slice(i>0).toLowerCase())
ETHproductions

2
On the other hand, I would like to offer b=>a+=(a?b[0].toUpperCase():'')+b.slice(!!a).toLowerCase() which I believe saves you another 4 bytes.
Neil

7

vim, 69 68 66

:s/[`']//g<cr>:s/[a-z]\zs\ze[A-Z][a-z]\|\W\|_/\r/g<cr>o<esc>guggj<C-v>GgU:%s/\n<cr>

vim shorter than Perl?! What is this madness?

:s/[`']//g<cr>           remove ` and '
:s/                      match...
 [a-z]\zs\ze[A-Z][a-z]   right before a lowercase-surrounded uppercase letter
 \|\W\|_                 or a non-word char or underscore
 /\r/g<cr>               insert newlines between parts
o<esc>                   add an extra line at the end, necessary later...
gugg                     lowercasify everything
j                        go to line 2 (this is why we added the extra line)
<C-v>G                   visual select the first char of all-but-first line
gU                       uppercase
:%s/\n<cr>               join all lines into one

Thanks to Neil for spotting a useless keystroke!


I can see why the last :s has a % but why the inconsistency in the first two?
Neil

@Neil Bah, muscle memory. Thanks!
Doorknob

5
Manages to be less readable than Perl, too +1
cat

I'm totally adding this to my .vimrc
moopet

1
@fruglemonkey 1. :%j<cr> is equivalent and shorter. 2. That adds spaces between lines.
Doorknob

5

Mathematica 10.1, 101 bytes

""<>(ToCamelCase@{##2}~Prepend~ToLowerCase@#&@@StringCases[StringDelete[#,"`"|"'"],WordCharacter..])&

Uses the undocumented ToCamelCase, which works similarly to Capitalize but sets other characters to lowercase.


Not in 10.3.0..
A Simmons

Is ToCamelCase[n_,m_]:=n<>Capitalize/@m correct? Seems like it. And why use Prepend when #~ToCamelCase~{##2} works?
CalculatorFeline

@CatsAreFluffy That gives me ToCamelCase::argx: ToCamelCase called with 2 arguments; 1 argument is expected.
LegionMammal978

Well, how does CamelCase work? Just ToCamelCase[n_]:=""<>Capitalize/@n?
CalculatorFeline

@CatsAreFluffy, see this.
LegionMammal978

5

Julia, 98 89 bytes

s->lcfirst(join(map(ucfirst,split(replace(s,r"['`]",""),r"[a-z]\K(?=[A-Z][a-z])|\W|_"))))

This is an anonymous function that accepts a string and returns a string. To call it, assign it to a variable.

The approach here is the same as in Doorknob's Perl answer: replace apostrophes and backticks with the empty string, split into an array on a regular expression that matches the necessary cases, map the ucfirst function over the array to uppercase the first letter of each element, join the array back into a string, and lcfirst the result to convert the first character to lowercase.


I've always liked Julia as a more functional, more interesting Python but I hate the end syntax. Maybe I'll just use anonymous functions for everything, then I never have to type end :D
cat

4

Perl 67 + 1 = 68 bytes

y/'`//d;s/([a-z](?=[A-Z][a-z]))|\W|_/$1 /g;$_=lc;s/^ +| +(.)/\u$1/g

Requires the -p flag, and -l for multi line:

$ perl -pl camelCase.pl input.txt
programmingPuzzlesCodeGolf
xmlHttpRequest
supportsIpv6OnIos:
someThingW1thApostrophesAndPuncTuation
nothingSpecial
5pecialCa5e
1337
1337Speak
abcd

How it works:

y/'`//d;                            # Remove ' and `
s/([a-z](?=[A-Z][a-z]))|\W|_/$1 /g; # Replace according to '2. Split...' this will create
                                    #   a space separated string.
$_=lc;                              # lower case string
s/^ +| +(.)/\u$1/g                  # CamelCase the space separated string and remove any
                                    #   potential leading spaces.

2

Perl, 87 80 78 bytes

y/'`//d;$_=join'',map{ucfirst lc}split/[a-z]\K(?=[A-Z][a-z])|\W|_/,$_;lcfirst

Byte added for the -p flag.

First, we use the y/// transliteration operator to delete all '` characters in the input:

y/'`//d;

Then comes the meat of the code:

                         split/[a-z]\K(?=[A-Z][a-z])|\W|_/,$_;

(split the input string $_ in the appropriate locations, using the fancy \K in the match string to exclude the portion preceding it from the actual match)

          map{ucfirst lc}

(map over each split portion of the string and make the entire string lowercase, then make the first character of the modified string uppercase)

$_=join'',

(join on empty string and re-assign to magic underscore $_, which gets printed at the end)

Finally, we lowercase the first letter by regex-matching it and using \l in the replacement string with a builtin, saving 2 bytes over the previous method:

lcfirst

Thanks to @MartinBüttner for 7 bytes ([^a-zA-Z\d] -> \W|_)!


1
How I envy that \K... ;)
Martin Ender

2

Lua, 127 Bytes

t=''l=t.lower z=io.read()for x in z:gmatch('%w+')do t=t..(t==''and l(x:sub(1,1))or x:sub(1,1):upper())..l(x:sub(2))end return t

Accepts a string from stdin and returns camelized results.

Probably still gonna look for a better solution as storing everything in a variable feels inefficient.

But anyhow, pretty simple in general:

 z:gmatch('%w+')

This is the beauty that saved me a bit of bytes. gmatch will split the string based on the pattern: %w+ which grabs only alphanumerics.

After that it's simple string operations. string.upper, string.lower and done.


2

PHP, 145 122 133 bytes

<?=join(split(" ",lcfirst(ucwords(strtolower(preg_replace(["#`|'#","#\W|_#","#([a-z])([A-Z][a-z])#"],[""," ","$1 $2"],$argv[1]))))));

Save to file, call from CLI.
Takes input from a single command line argument; escape quotes and whitespace where necessary.

breakdown

<?=                 // 9. print result
join(split(" ",     // 8. remove spaces
    lcfirst(        // 7. lowercase first character
    ucwords(        // 6. uppercase first character in every word
    strtolower(     // 5. lowercase everything
    preg_replace(
        ["#`|'#",   "#\W|_#",   "#([a-z])([A-Z][a-z])#"],
        ["",        " ",        "$1 $2"],
        // 2. replace apostrophes with empty string (remove them)
                    // 3. replace non-word characters with space
                                // 4. insert space before solitude uppercase
        $argv[1]    // 1. take input from command line
    ))))
));

lcfirst allowed to reduce this to a single command, saving 23 bytes.
Fixing the apostrophes cost 11 bytes for the additional replace case.


1

Kotlin, 160 Bytes

fun a(s: String)=s.replace(Regex("['`]"),"").split(Regex("[\\W_]+|(?<=[a-z])(?=[A-Z][a-z])")).map{it.toLowerCase().capitalize()}.joinToString("").decapitalize()

My goal was to be Scala, the other "alternative Java", so I'm somewhat happy with my results. I stole the regex from the Java answer.

Test it with:

fun main(args: Array<String>) {
    val testCases = arrayOf(
            "Programming Puzzles & Code Golf",
            "XML HTTP request",
            "supports IPv6 on iOS?",
            "SomeThing w1th, apo'strophe's and' punc]tuation",
            "nothing special",
            "5pecial ca5e",
            "1337",
            "1337-spEAk",
            "abcD",
            "a",
            "B")
    testCases.forEach { println(a(it)) }

}

At this point I think everyone is "borrowing" the optimized regex \W|_|(?<=[a-z])(?=[A-Z][a-z]) or slightly modifying it eg. [\W_]+
CAD97

you can save some on map and extension function fun String.a()=replace(Regex("['`]"),"").split(Regex("[\\W_]+|(?<=[a-z])(?=[A-Z][a-z])")).joinToString(""){it.toLowerCase().capitalize()}.decapitalize()
poss

1

Scala, 181 170 144

def f(s:String)={val l=s.replaceAll("'|`","")split("[\\W_]+|(?<=[a-z])(?=[A-Z][a-z])")map(_.toLowerCase);l(0)+l.tail.map(_.capitalize).mkString}

Tester:

val testCases = List(
  "Programming Puzzles & Code Golf" -> "programmingPuzzlesCodeGolf",
  "XML HTTP request" -> "xmlHttpRequest"
  // etc
)
println(testCases.map(t=>if(t._2!=f(t._1))s"FAIL:${f(t._1)}"else"PASS").mkString("\n"))

Props to CAD97 and apologies to Nathan Merrill :)


1
You can save 6 bytes by replacing [^a-zA-Z0-9]+ with [\\W_]+.
CAD97

0

C 272 characters

C program pass string to camelCase in quotes as argument 1. There are lot's of gotchas in this problem statement...

#define S strlen(t)
#define A isalnum(t[i])
j=0;main(i,v)char**v;{char*p=v[1],*t;char o[99]={0};while(t=strtok(p," [{(~!@#$%^*-+=)}]")){i=0;p+=S+1;while((!A)&&i<S)i++;if(i!=S){o[j]=((j++==0)?tolower(t[i++]):toupper(t[i++]));while(i<S){if(A)o[j++]=t[i];i++;}}}puts(o);}

You need to #include<string.h> for strlen, strtok, and toupper, and #include<ctype.h> for isalnum.
Mego

I didn't need it using gcc 3.4.4 in cygwin. They must be automatically linked in, assuming extern int.
cleblanc

With ./camel "Programming Puzzles & Code Golf" on cygwin (compiled with gcc 3.4.4), I get programmingPuzzlesCodeEGolf. Same output with 5.3.0.
Mego

Crap. me too. I must've created a bug while golfing it. I'm looking at it now...
cleblanc

The problem was I added the other tokenizer strings after golfing and didn't test it well enough. If you remove the '&' from the strtok call it works on that input.
cleblanc

0

JavaScript, 123 bytes

v=>v[r="replace"](/[`']/g,"")[r](/^.|.$|[A-Z][^a-z]+/g,x=>x.toLowerCase())[r](/[^a-z0-9]+./ig,x=>x.slice(-1).toUpperCase())

Readable version

v=>
  v.replace(/[`']/g,"")
  .replace(/^.|.$|[A-Z][^a-z]+/g,x=>x.toLowerCase())
  .replace(/[^a-z0-9]+./ig,x=>x.slice(-1).toUpperCase())

Remove the apostrophes, make the first character lower case, the last character lowercase, and any grouping of multiple uppercase characters, match any group of 1 or more non-alphanumeric chars + 1 other character, replace with that last character capitalized.

[r="replace"] trick from Mrw247's solution.

Utilizzando il nostro sito, riconosci di aver letto e compreso le nostre Informativa sui cookie e Informativa sulla privacy.
Licensed under cc by-sa 3.0 with attribution required.