Category:CypherTech: Difference between revisions

From Traxel Wiki
Jump to navigation Jump to search
Tag: Manual revert
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
[[Category:Cypherpunk]]
[[Category:Cypherpunk]]
= Resilience =
= Web UX =
== Sync ==
* HTML First Design / Minimal Javascript
=== Pull Discussions ===
** https://old.reddit.com/r/ProgrammerHumor/comments/18r9fu0/theworldwouldbebetterwithplainhtml/
This has to be done separately because the remote harvester host will not have the full discussion archive. Remote will have the current and previous month, which can be pulled with --delete.
<pre>
cd /opt/cypherpunk/data/reddit/json
find discussion -name '2023-09' -type d | awk '{print "rsync -avz --size-only --delete www.iterativechaos.com:/opt/cypherpunk/data/reddit/json/"$1"/ ./"$1"/"}'
</pre>
 
=== Pull All Other JSON ===
<pre>
rsync -avz --size-only --delete www.iterativechaos.com:/opt/cypherpunk/data/reddit/json/link_list/ /opt/cypherpunk/data/reddit/json/link_list/
rsync -avz --size-only --delete www.iterativechaos.com:/opt/cypherpunk/data/reddit/json/archive_link_list/ /opt/cypherpunk/data/reddit/json/archive_link_list/
</pre>
 
=== S3 Backup ===
==== JSON, Daily ====
<pre>
time aws s3 sync --size-only --delete /opt/cypherpunk/data/reddit/json/ s3://iterative-chaos/cyphernews/harvest/reddit/json/
</pre>
==== Parquet Compacted, Weekly ====
<pre>
time aws s3 sync --size-only --delete /opt/cypherpunk/data/reddit/parquet/compacted_raw/ s3://iterative-chaos/cyphernews/harvest/reddit/parquet/compacted_raw/
</pre>


= GPT =
= GPT =
Line 29: Line 8:


* API Site: https://platform.openai.com/docs/introduction
* API Site: https://platform.openai.com/docs/introduction
 
* GPT 4o: https://platform.openai.com/docs/models/gpt-4o
= Data Processing Process =
# $ python get_link_lists.py # daily
## Needs data archival / backup.
### Doing a brute-force full backup right now, might be sufficient for the time being.
#### $ scp -i key.pem reddit-link-list.tar.bz2 admin@www.iterativechaos.com:./
### Can do just week/day going forward, but that will quickly get slow.
### Storing the files as .json.bz2 would be a big improvement
# $ python parquet_link_list.py
## (done) Make this iterate and do a full refresh each time
## Reconsider full refresh when processing time goes over a minute (currently runs in 12 seconds (not sure if that's real or user))
## Old Version:
### $ python parquet_link_list.py ../data/reddit/link_list/day/science/ ../data/reddit/parquet/link_list/day/science/
# $ python dedupe_link_list_parquet.py
## (done) Make this iterate and do a full refresh each time
## Reconsider full refresh when processing time goes over a minute (currently runs in 2 seconds (real))
# $ python get_discussions.py
## Make this check the existing download, harvest timestamp, num_comments
# $ python discussion_to_gpt.py
## Make this iterate and do a full refresh each time
## Reconsider full refresh when processing time goes over a minute
# hit ChatGPT with output
## Change this to API call to ChatGPT
## Make this check the existing generate, harvest versus generate timestamp
## Add data archival / backup
 
* 2023-08-26: Overnight buildup: 3,649 (bug)
* 2023-08-27: overnight buildup: 369
 
= Interesting Subreddits =
* aitah
* antiwork
* ask
* askmen
* askreddit
* askscience
* chatgpt
* conservative
* dataisbeautiful
* explainlikeimfive
* latestagecapitalism
* leopardsatemyface
* lifeprotips
* news
* nostupidquestions
* outoftheloop
* personalfinance
* politics
* programmerhumor
* science
* technology
* todayilearned
* tooafraidtoask
* twoxchromosomes
* unpopularopinion
* worldnews
* youshouldknow
 
= Reddit OAuth2 =
* https://www.reddit.com/r/redditdev/wiki/oauth2/explanation/
* https://www.reddit.com/dev/api/oauth/
* https://github.com/reddit-archive/reddit/wiki/OAuth2
** https://github.com/reddit-archive/reddit/wiki/OAuth2#application-only-oauth
 
'''Example Curl Request'''
<syntaxhighlight lang="bash" line>
curl
  -X POST
  -d 'grant_type=password&username=reddit_bot&password=snoo'
  --user 'p-jcoLKBynTLew:gko_LXELoV07ZBNUXrvWZfzE3aI'
  https://www.reddit.com/api/v1/access_token
</syntaxhighlight>
'''Real Curl Request'''
<syntaxhighlight lang="bash" line>
curl
  -X POST
  -d 'grant_type=client_credentials'
  --user 'client_id:client_secret'
  https://www.reddit.com/api/v1/access_token
</syntaxhighlight>
'''One Line'''
<syntaxhighlight lang="bash" line>
curl -X POST -d 'grant_type=client_credentials' --user 'client_id:client_secret' https://www.reddit.com/api/v1/access_token
</syntaxhighlight>
'''Oauth Data Call'''
<syntaxhighlight lang="bash" line>
$ curl -H "Authorization: bearer J1qK1c18UUGJFAzz9xnH56584l4" -A "Traxelbot/0.1 by rbb36" https://oauth.reddit.com/api/v1/me
$ curl -H "Authorization: bearer J1qK1c18UUGJFAzz9xnH56584l4" -A "Traxelbot/0.1 by rbb36" https://oauth.reddit.com/r/news/top?t=day&limit=100
</syntaxhighlight>
* https://old.reddit.com/r/worldnews/top/?sort=top&t=day
* /r/subreddit/top?t=day&limit=100
* count=100&
= Reddit Python =
* pip install aiofiles aiohttp asyncio
* https://realpython.com/async-io-python/
== t3 fields of interest ==
# "url_overridden_by_dest": "https://www.nbcnews.com/politics/donald-trump/live-blog/trump-georgia-indictment-rcna98900",
# "url": "https://www.nbcnews.com/politics/donald-trump/live-blog/trump-georgia-indictment-rcna98900",
# "title": "What infamous movie plot hole has an explanation that you're tired of explaining?",
# "downs": 0,
# "upvote_ratio": 0.94,
# "ups": 10891,
# "score": 10891,
# "created": 1692286512.0,
# "num_comments": 8112,
# "created_utc": 1692286512.0,
 
= Minimal Term Set =
<pre>
hands, mouth, eyes, head, ears, nose, face, legs, teeth, fingers, breasts, skin, bones, blood,
be born, children, men, women, mother, father, wife, husband,
long, round, flat, hard, soft, sharp, smooth, heavy, sweet,
stone, wood, made of,
be on something, at the top, at the bottom, in front, around,
sky, ground, sun, during the day, at night, water, fire, rain, wind,
day,
creature, tree, grow (in ground), egg, tail, wings, feathers, bird, fish, dog,
we, know (someone), be called,
hold, sit, lie, stand, sleep,
play, laugh, sing, make, kill, eat, drink,
river, mountain, jungle/forest, desert, sea, island,
rain, wind, snow, ice, air,
flood, storm, drought, earthquake,
east, west, north, south,
bird, fish, tree,
dog, cat, horse, sheep, goat, cow, pig (camel, buffalo, caribou, seal, etc.),
mosquitoes, snake, flies,
family, we,
year, month, week, clock, hour,
house, village, city,
school, hospital, doctor, nurse, teacher, soldier,
country, government, the law, vote, border, flag, passport,
meat, rice, wheat, corn (yams, plantain, etc.), flour, salt, sugar, sweet,
knife, key, gun, bomb, medicines,
paper, iron, metal, glass, leather, wool, cloth, thread,
gold, rubber, plastic, oil, coal, petrol,
car, bicycle, plane, boat, train, road, wheel, wire, engine, pipe, telephone, television, phone, computer,
read, write, book, photo, newspaper, film,
money, God, war, poison, music,
go/went, burn, fight, buy/pay, learn,
clean
</pre>

Latest revision as of 00:00, 7 June 2024

Web UX

GPT

openai.error.RateLimitError: Rate limit reached for 10KTPM-200RPM in organization org-CleAC9cITP7hCx44ZiB2gw5d on tokens per min. Limit: 10000 / min. Please try again in 6ms. Contact us through our help center at help.openai.com if you continue to have issues.

Pages in category "CypherTech"

The following 7 pages are in this category, out of 7 total.