Come sincronizzare i file con s3cmd su Amazon S3, controllare se inviato e rimosso localmente?


Sto cercando di usare Amazon S3 servizio da riporre logs dalle mie applicazioni. Dato /user/bin/s3cmd --help mi dice cosa devo sapere come inviare i file:

s3cmd --help
usage: s3cmd [options] COMMAND [parameters]

S3cmd is a tool for managing objects in Amazon S3 storage. It allows for
making and removing "buckets" and uploading, downloading and removing
"objects" from these buckets.

  -h, --help            show this help message and exit
  --configure           Invoke interactive (re)configuration tool.
  -c FILE, --config=FILE
                        Config file name. Defaults to
  --dump-config         Dump current configuration after parsing config files
                        and command line options and exit.
  -n, --dry-run         Only show what should be uploaded or downloaded but
                        don't actually do it. May still perform S3 requests to
                        get bucket listings and other information though (only
                        for file transfer commands)
  -e, --encrypt         Encrypt files before uploading to S3.
  --no-encrypt          Don't encrypt files.
  -f, --force           Force overwrite and other dangerous operations.
  --continue            Continue getting a partially downloaded file (only for
                        [get] command).
  --skip-existing       Skip over files that exist at the destination (only
                        for [get] and [sync] commands).
  -r, --recursive       Recursive upload, download or removal.
  --check-md5           Check MD5 sums when comparing files for [sync].
  --no-check-md5        Do not check MD5 sums when comparing files for [sync].
                        Only size will be compared. May significantly speed up
                        transfer but may also miss some changed files.
  -P, --acl-public      Store objects with ACL allowing read for anyone.
  --acl-private         Store objects with default ACL allowing access for you
                        Grant stated permission to a given amazon user.
                        Permission is one of: read, write, read_acp,
                        write_acp, full_control, all
                        Revoke stated permission for a given amazon user.
                        Permission is one of: read, write, read_acp, wr
                        ite_acp, full_control, all
  --delete-removed      Delete remote objects with no corresponding local file
  --no-delete-removed   Don't delete remote objects.
  -p, --preserve        Preserve filesystem attributes (mode, ownership,
                        timestamps). Default for [sync] command.
  --no-preserve         Don't store FS attributes
  --exclude=GLOB        Filenames and paths matching GLOB will be excluded
                        from sync
  --exclude-from=FILE   Read --exclude GLOBs from FILE
  --rexclude=REGEXP     Filenames and paths matching REGEXP (regular
                        expression) will be excluded from sync
  --rexclude-from=FILE  Read --rexclude REGEXPs from FILE
  --include=GLOB        Filenames and paths matching GLOB will be included
                        even if previously excluded by one of
                        --(r)exclude(-from) patterns
  --include-from=FILE   Read --include GLOBs from FILE
  --rinclude=REGEXP     Same as --include but uses REGEXP (regular expression)
                        instead of GLOB
  --rinclude-from=FILE  Read --rinclude REGEXPs from FILE
                        Datacentre to create bucket in. As of now the
                        datacenters are: US (default), EU, us-west-1, and ap-
  --reduced-redundancy, --rr
                        Store object with 'Reduced redundancy'. Lower per-GB
                        price. [put, cp, mv]
                        Target prefix for access logs (S3 URI) (for [cfmodify]
                        and [accesslog] commands)
  --no-access-logging   Disable access logging (for [cfmodify] and [accesslog]
  -m MIME/TYPE, --mime-type=MIME/TYPE
                        Default MIME-type to be set for objects stored.
  -M, --guess-mime-type
                        Guess MIME-type of files by their extension. Falls
                        back to default MIME-Type as specified by --mime-type
                        Add a given HTTP header to the upload request. Can be
                        used multiple times. For instance set 'Expires' or
                        'Cache-Control' headers (or both) using this options
                        if you like.
  --encoding=ENCODING   Override autodetected terminal and filesystem encoding
                        (character set). Autodetected: UTF-8
  --verbatim            Use the S3 name as given on the command line. No pre-
                        processing, encoding, etc. Use with caution!
  --list-md5            Include MD5 sums in bucket listings (only for 'ls'
  -H, --human-readable-sizes
                        Print sizes in human readable form (eg 1kB instead of
  --progress            Display progress meter (default on TTY).
  --no-progress         Don't display progress meter (default on non-TTY).
  --enable              Enable given CloudFront distribution (only for
                        [cfmodify] command)
  --disable             Enable given CloudFront distribution (only for
                        [cfmodify] command)
  --cf-add-cname=CNAME  Add given CNAME to a CloudFront distribution (only for
                        [cfcreate] and [cfmodify] commands)
                        Remove given CNAME from a CloudFront distribution
                        (only for [cfmodify] command)
  --cf-comment=COMMENT  Set COMMENT for a given CloudFront distribution (only
                        for [cfcreate] and [cfmodify] commands)
                        Set the default root object to return when no object
                        is specified in the URL. Use a relative path, i.e.
                        default/index.html instead of /default/index.html or
                        s3://bucket/default/index.html (only for [cfcreate]
                        and [cfmodify] commands)
  -v, --verbose         Enable verbose output.
  -d, --debug           Enable debug output.
  --version             Show s3cmd version (1.0.0) and exit.
  -F, --follow-symlinks
                        Follow symbolic links as if they are regular files

Ma non dice come controllare se il file è stato inviato e rimuovere quelli inviati. Dovrei controllare tramite MD5 ed eliminare localmente da alcuni shell sceneggiatura?

Uso --check-md5 per determinare se i file che hai caricato sono sincronizzati.

Aggiungi questo parametro quando sto usando il put operazione? O dopo?
Valter Silva

È dubbio che sia importante in base alla descrizione di --check-md5 prova entrambi con un file di esempio e pubblica i risultati.



Dopo un po 'di tempo sono stato in grado di sviluppare un codice in bash che controlla il md5sum da entrambi, s3 e la mia local file e rimuovere il local file già presenti amazon s3:


s3=`s3cmd ls --list-md5 -H s3://company-backup/company/"$datacenter"/"$hostname"/"$path"/`

s3_list=`echo "$s3"|awk {'print $4" "$5'} | sed 's= .*/= ='`

locally=`md5sum /"$path"/*.gz`;
locally_list=$(echo "$locally" | sed 's= .*/= =');
#echo "$locally_list";

for i in $locally_list
  #echo $i
  locally_hash=`echo $i|awk {'print $1'}`
  locally_file=`echo $i|awk {'print $2'}`

  for j in $s3_list
    s3_hash=$(echo $j|awk {'print $1'}); 
    s3_file=$(echo $j|awk {'print $2'});

    #to avoid empty file when have only hash from folder
    if [[ $s3_hash != "" ]] && [[ $s3_file != "" ]]; then 
      if [[ $s3_hash == $locally_hash ]] && [[ $s3_file == $locally_file ]]; then
        echo "### REMOVING ###";
        echo "$locally_file";
        #rm /"$path"/"$locally_file";
unset IFS


FWIW, avevo bisogno di fare qualcosa di simile e ho scritto il seguente script bash. Quello che fa è:

  1. ottiene un elenco di file in una directory che è più vecchia di $ MINUTES minuti usando find
  2. usi lsof per determinare se il file è aperto (questo potrebbe non essere vero se il file è detto aperto da un editor)
  3. usi s3cmd per copiare il file in un bucket S3.
  4. confronta le somme MD5 sul file remoto in S3 e quello locale. Se effettuano il check-out, cancella quello locale.


TARGET_DIR="s3://AWSbucketname/subfolder/`hostname -s`/"

echo ""
echo "About to upload files in $LOCAL_DIR up to S3 folder:"
echo "    $TARGET_DIR"
echo "Then delete if MD5 sums line up."
echo "Starting in 5 seconds..."
sleep 5


# Throw the list of files that the find command gets into an array
while IFS= read -d $'\0' -r file ; do
    FILES=("${FILES[@]}" "$file")
done < <(find $LOCAL_DIR -name \*.wav -mmin +$MINUTES -print0)

# echo "${WAV_FILES[@]}"   # DEBUG

for local_file in "${WAV_FILES[@]}"
    # Check that the file in question is not open.
    # lsof returns non-zero return value for file not in use
    lsof "$local_file" 2>&1 > /dev/null
    if test $? -ne 0 ; then
        echo ""
        echo "$local_file isn't open. Copying to S3..."
        s3cmd -p put $local_file $TARGET_DIR
        # s3cmd -n put $local_file $TARGET_DIR # DEBUG - dry-run

        ## Now attempt to delete if the MD5 sums check out:

        md5sum_remote=`s3cmd info  "$TARGET_DIR$remote_file" | grep MD5 | awk '{print $3}'`
        md5sum_local=`md5sum $local_file | awk '{print $1}'`
        if [[ "$md5sum_remote" == "$md5sum_local" ]]; then
          echo "$remote_file MD5 sum checks out. Deleting..."
          rm $local_file


Dalla documentazione ufficiale:

--delete-after (Esegui eliminazioni dopo i nuovi caricamenti [sincronizzazione])


--delete-after-fetch (Elimina oggetti remoti dopo aver recuperato il file locale (solo per i comandi [ottieni] e [sincronizza]).)

se si desidera eseguire la sincronizzazione da remoto a locale

