Category:AWS: Difference between revisions

From Traxel Wiki
Jump to navigation Jump to search
 
(5 intermediate revisions by the same user not shown)
Line 16: Line 16:
<pre>
<pre>
aws --profile iterative-chaos s3 sync dedupe s3://iterative-chaos/cyphernews/harvest/reddit/parquet/dedupe --size-only --delete
aws --profile iterative-chaos s3 sync dedupe s3://iterative-chaos/cyphernews/harvest/reddit/parquet/dedupe --size-only --delete
</pre>
=== Sync Cyphernews ===
==== the Link List JSONs ====
<pre>
aws s3 sync ../data/reddit/json/link_list s3://iterative-chaos/cyphernews/harvest/reddit/json/link_list --size-only --delete
aws s3 sync ../data/reddit/json/archive_link_list s3://iterative-chaos/cyphernews/harvest/reddit/json/archive_link_list --size-only --delete
</pre>
==== Sync the Discussion JSONs ====
<pre>
aws s3 sync ../data/reddit/json/discussion s3://iterative-chaos/cyphernews/harvest/reddit/json/discussion --size-only --delete
</pre>
==== Sync Parquet ====
<pre>
aws s3 sync ../data/reddit/parquet s3://iterative-chaos/cyphernews/harvest/reddit/parquet --size-only --delete
</pre>
</pre>

Latest revision as of 00:51, 24 September 2023

CLI

S3 Sync

aws --profile monkey-banana s3 sync <local_directory> s3://<bucket_name>/<optional_prefix> --exact-timestamps --size-only

Here's a breakdown of the options:

  • <local_directory>: The local directory you want to sync with S3.
  • s3://<bucket_name>/<optional_prefix>: The destination S3 bucket and an optional prefix (like a folder).
  • --exact-timestamps: By default, aws s3 sync uses the LastModified time to determine whether an S3 object is the same as a local file. If the times are not the same but the sizes of the files are the same, AWS CLI will consider them to be the same and will not replace the file. The --exact-timestamps option changes this behavior to consider files as different if their LastModified times are different.
  • --size-only: Use this option to make the comparison based on the size of the files only, and not the last modified timestamp. This can be useful if timestamps might differ but the content hasn't changed.

Note:

  • Ensure you've properly configured your AWS CLI with the necessary access rights to perform the s3 sync operation.
  • The aws s3 sync command by default won't delete files in the destination that are not present in the source. However, if you add the --delete option, it would delete files from the S3 bucket th
aws --profile iterative-chaos s3 sync dedupe s3://iterative-chaos/cyphernews/harvest/reddit/parquet/dedupe --size-only --delete

Sync Cyphernews

the Link List JSONs

aws s3 sync ../data/reddit/json/link_list s3://iterative-chaos/cyphernews/harvest/reddit/json/link_list --size-only --delete
aws s3 sync ../data/reddit/json/archive_link_list s3://iterative-chaos/cyphernews/harvest/reddit/json/archive_link_list --size-only --delete

Sync the Discussion JSONs

aws s3 sync ../data/reddit/json/discussion s3://iterative-chaos/cyphernews/harvest/reddit/json/discussion --size-only --delete

Sync Parquet

aws s3 sync ../data/reddit/parquet s3://iterative-chaos/cyphernews/harvest/reddit/parquet --size-only --delete

This category currently contains no pages or media.