Hah! That's so aweseome. When I spent an IISME internship summer at Intel years ago, I wrote a unit middle school math students about faux-Bayesian span filtering. ( http://en.wikipedia.org/wiki/Bayesian_spam_filtering ) When intriducing the concept of word frequency we ran word counts on Tom Sawyer, Huck Finn and a a half dozen other Project Gutenburg books.
Your Wordle reminds me of how far we had to look down the frequency list before to hit useful words, and how narrow the slice was. "Tom, the, and, but" aren't useful and neither are the 1000's of words that have one or two occurrences in the novel.
This is why spam filtering is better done far above the per-user level.
Man, that's a lot of memories, a lot of data, triggered by a small black square with a big yellow "Tom." Thanks for the flashbacks!
Hah! That's so aweseome.
ReplyDeleteWhen I spent an IISME internship summer at Intel years ago, I wrote a unit middle school math students about faux-Bayesian span filtering. ( http://en.wikipedia.org/wiki/Bayesian_spam_filtering ) When intriducing the concept of word frequency we ran word counts on Tom Sawyer, Huck Finn and a a half dozen other Project Gutenburg books.
Your Wordle reminds me of how far we had to look down the frequency list before to hit useful words, and how narrow the slice was. "Tom, the, and, but" aren't useful and neither are the 1000's of words that have one or two occurrences in the novel.
This is why spam filtering is better done far above the per-user level.
Man, that's a lot of memories, a lot of data, triggered by a small black square with a big yellow "Tom." Thanks for the flashbacks!
That is fun, love how Tom is precariously tipped on the edge of the Wordle as if the very name is looking for an adventure of its own.
ReplyDelete