2019-02-06: Back to the Keyboard
Time to get cracking at software again. Today I’m looking at Redis, 4chan, and Golang.
Chantrending
I had a pet project a while back that simply used NLTK to identify any proper nouns, and put them in a PostgreSQL database so they could be graphed. You could see either what was the most popular proper noun for a given unix time, or see a trend of proper nouns over time with a time-series graph.
The way I’m doing this is having a Golang process that queries the 4chan api with seperate worker processes for each board. After querying, it’ll dump everything into a Redis store, and serve up stuff from this Redis store as an endpont that can be queried by a graphing application.
The hard part is how to represent this in Redis. It’s easy enough for threads or posts, just something like THREAD::tg:105765609
which is a list of post numbers, POST::tg:104161606
to represent individual posts, etc. But how about terms? The easy answer is something like a list of SNAPSHOT::tg:someunixtime
keys that have sorted sets of "Term" : count
pairs. But I’d love to utalize the Redis Streams, but they only allow sets of unixtimes and sets of field-value pairs. A “field” is basically a variable name, so it’s not suiable for something like “Richard Stallman” which has capitalization and spaces. So maybe XADD SNAPSHOT::tv * 1 "Richard Stallman" 2 "God" ...
and then simultaniously a stream of XADD SNAPSHOT::tv:Richard Stallman
? But then you have escaping problems, so maybe it should instead be XADD "SNAPSHOT::tv:Richard Stallman" * realfreq 88 adjustedfreq 43 threadcount 5 postcount 3 opcount 1
?
One other thing I was thinking is having these “concepts” of things like “Richard Stallman” each being object. So things like “Richard Stallman,” “stallman,” “RMS,” and such would all ‘point’ to the same concept of “Richard Stallman.” But this dives into NLP and it’s hard to not just hardcode all this.
It’ll be easier to work on it one step at a time. I got a pretty detailed data schema so it’s just a matter of implementation now.