
From Traxel Wiki
Jump to navigation Jump to search

Data Processing Process

  1. $ python
    1. Pulls 24h top 100 every 4 hours
    2. Pulls 7d top 1,000 every 24 hours
    3. Stores as json.bz2, in pages of 100 links.
  2. $ python
    1. Union all of json.bz2 rows into Parquet (ie: raw layer).
    2. Uses compaction, gets slow after about a week without a compaction run.
  3. $ python
    1. Takes the most recent entry for each link_id from the raw layer.
    2. Writes to parquet subreddit/day files.
  4. $ python
    1. Finds discussions above a threshold of comments and upvotes.
    2. Downloads the discussion and stores as json.bz2.
    3. Skips download if a file exists and no more than 20% increase in comments.
    4. TODO: should do a "final download" after a week (or whatever).
  5. $ python
    1. Finds the top couple discussions in a set of subreddits.
    2. Pulls the top N tokens worth of comments from each discussion.
    3. Sends to GPT to summarize.
    4. Writes each summary to html/summary/{link-id}.html
    5. Writes the list to archive/gpt_stories_{edition_id}.json
  6. $ python
    1. Loads the 10 most recent gpt_stories_*.json
    2. generates the news.html file.
  7. $ rsync -avz html/summary /var/www/html/
  8. $ cp html/news.html /var/www/html/
  • 2023-08-26: Overnight buildup: 3,649 (bug)
  • 2023-08-27: overnight buildup: 369

Interesting Subreddits

  • aitah
  • antiwork
  • ask
  • askmen
  • askreddit
  • askscience
  • chatgpt
  • conservative
  • dataisbeautiful
  • explainlikeimfive
  • latestagecapitalism
  • leopardsatemyface
  • lifeprotips
  • news
  • nostupidquestions
  • outoftheloop
  • personalfinance
  • politics
  • programmerhumor
  • science
  • technology
  • todayilearned
  • tooafraidtoask
  • twoxchromosomes
  • unpopularopinion
  • worldnews
  • youshouldknow

Reddit OAuth2

Example Curl Request

  -d 'grant_type=password&username=reddit_bot&password=snoo'
  --user 'p-jcoLKBynTLew:gko_LXELoV07ZBNUXrvWZfzE3aI'

Real Curl Request

  -d 'grant_type=client_credentials'
  --user 'client_id:client_secret'

One Line

curl -X POST -d 'grant_type=client_credentials' --user 'client_id:client_secret'

Oauth Data Call

$ curl -H "Authorization: bearer J1qK1c18UUGJFAzz9xnH56584l4" -A "Traxelbot/0.1 by rbb36"
$ curl -H "Authorization: bearer J1qK1c18UUGJFAzz9xnH56584l4" -A "Traxelbot/0.1 by rbb36"

Reddit Python

t3 fields of interest

  1. "url_overridden_by_dest": "",
  2. "url": "",
  3. "title": "What infamous movie plot hole has an explanation that you're tired of explaining?",
  4. "downs": 0,
  5. "upvote_ratio": 0.94,
  6. "ups": 10891,
  7. "score": 10891,
  8. "created": 1692286512.0,
  9. "num_comments": 8112,
  10. "created_utc": 1692286512.0,

Minimal Term Set

hands, mouth, eyes, head, ears, nose, face, legs, teeth, fingers, breasts, skin, bones, blood,
be born, children, men, women, mother, father, wife, husband,
long, round, flat, hard, soft, sharp, smooth, heavy, sweet,
stone, wood, made of,
be on something, at the top, at the bottom, in front, around,
sky, ground, sun, during the day, at night, water, fire, rain, wind,
creature, tree, grow (in ground), egg, tail, wings, feathers, bird, fish, dog,
we, know (someone), be called,
hold, sit, lie, stand, sleep,
play, laugh, sing, make, kill, eat, drink,
river, mountain, jungle/forest, desert, sea, island,
rain, wind, snow, ice, air,
flood, storm, drought, earthquake,
east, west, north, south,
bird, fish, tree,
dog, cat, horse, sheep, goat, cow, pig (camel, buffalo, caribou, seal, etc.),
mosquitoes, snake, flies,
family, we,
year, month, week, clock, hour,
house, village, city,
school, hospital, doctor, nurse, teacher, soldier,
country, government, the law, vote, border, flag, passport,
meat, rice, wheat, corn (yams, plantain, etc.), flour, salt, sugar, sweet,
knife, key, gun, bomb, medicines,
paper, iron, metal, glass, leather, wool, cloth, thread,
gold, rubber, plastic, oil, coal, petrol,
car, bicycle, plane, boat, train, road, wheel, wire, engine, pipe, telephone, television, phone, computer,
read, write, book, photo, newspaper, film,
money, God, war, poison, music,
go/went, burn, fight, buy/pay, learn,