Revision as of 17:06, 5 October 2023 by RobertBushman (talk | contribs)
Data Processing Process
- $ python # daily
- Needs data archival / backup.
- Doing a brute-force full backup right now, might be sufficient for the time being.
- $ scp -i key.pem reddit-link-list.tar.bz2
- Can do just week/day going forward, but that will quickly get slow.
- Storing the files as .json.bz2 would be a big improvement
- Doing a brute-force full backup right now, might be sufficient for the time being.
- Needs data archival / backup.
- $ python
- (done) Make this iterate and do a full refresh each time
- Reconsider full refresh when processing time goes over a minute (currently runs in 12 seconds (not sure if that's real or user))
- Old Version:
- $ python ../data/reddit/link_list/day/science/ ../data/reddit/parquet/link_list/day/science/
- $ python
- (done) Make this iterate and do a full refresh each time
- Reconsider full refresh when processing time goes over a minute (currently runs in 2 seconds (real))
- $ python
- Make this check the existing download, harvest timestamp, num_comments
- $ python
- Make this iterate and do a full refresh each time
- Reconsider full refresh when processing time goes over a minute
- hit ChatGPT with output
- Change this to API call to ChatGPT
- Make this check the existing generate, harvest versus generate timestamp
- Add data archival / backup
- 2023-08-26: Overnight buildup: 3,649 (bug)
- 2023-08-27: overnight buildup: 369
Interesting Subreddits
- aitah
- antiwork
- ask
- askmen
- askreddit
- askscience
- chatgpt
- conservative
- dataisbeautiful
- explainlikeimfive
- latestagecapitalism
- leopardsatemyface
- lifeprotips
- news
- nostupidquestions
- outoftheloop
- personalfinance
- politics
- programmerhumor
- science
- technology
- todayilearned
- tooafraidtoask
- twoxchromosomes
- unpopularopinion
- worldnews
- youshouldknow
Reddit OAuth2
Example Curl Request
-d 'grant_type=password&username=reddit_bot&password=snoo'
--user 'p-jcoLKBynTLew:gko_LXELoV07ZBNUXrvWZfzE3aI'
Real Curl Request
-d 'grant_type=client_credentials'
--user 'client_id:client_secret'
One Line
curl -X POST -d 'grant_type=client_credentials' --user 'client_id:client_secret'
Oauth Data Call
$ curl -H "Authorization: bearer J1qK1c18UUGJFAzz9xnH56584l4" -A "Traxelbot/0.1 by rbb36"
$ curl -H "Authorization: bearer J1qK1c18UUGJFAzz9xnH56584l4" -A "Traxelbot/0.1 by rbb36"
- /r/subreddit/top?t=day&limit=100
- count=100&
Reddit Python
- pip install aiofiles aiohttp asyncio
t3 fields of interest
- "url_overridden_by_dest": "",
- "url": "",
- "title": "What infamous movie plot hole has an explanation that you're tired of explaining?",
- "downs": 0,
- "upvote_ratio": 0.94,
- "ups": 10891,
- "score": 10891,
- "created": 1692286512.0,
- "num_comments": 8112,
- "created_utc": 1692286512.0,
Minimal Term Set
hands, mouth, eyes, head, ears, nose, face, legs, teeth, fingers, breasts, skin, bones, blood, be born, children, men, women, mother, father, wife, husband, long, round, flat, hard, soft, sharp, smooth, heavy, sweet, stone, wood, made of, be on something, at the top, at the bottom, in front, around, sky, ground, sun, during the day, at night, water, fire, rain, wind, day, creature, tree, grow (in ground), egg, tail, wings, feathers, bird, fish, dog, we, know (someone), be called, hold, sit, lie, stand, sleep, play, laugh, sing, make, kill, eat, drink, river, mountain, jungle/forest, desert, sea, island, rain, wind, snow, ice, air, flood, storm, drought, earthquake, east, west, north, south, bird, fish, tree, dog, cat, horse, sheep, goat, cow, pig (camel, buffalo, caribou, seal, etc.), mosquitoes, snake, flies, family, we, year, month, week, clock, hour, house, village, city, school, hospital, doctor, nurse, teacher, soldier, country, government, the law, vote, border, flag, passport, meat, rice, wheat, corn (yams, plantain, etc.), flour, salt, sugar, sweet, knife, key, gun, bomb, medicines, paper, iron, metal, glass, leather, wool, cloth, thread, gold, rubber, plastic, oil, coal, petrol, car, bicycle, plane, boat, train, road, wheel, wire, engine, pipe, telephone, television, phone, computer, read, write, book, photo, newspaper, film, money, God, war, poison, music, go/went, burn, fight, buy/pay, learn, clean