In particular, I’d like to see “smarter” text analysis for things currently handled by LIWC. I’ve seen people complaining on the support boards about their entirely secular writing being tagged as “Concerned about Religion”; myself, I’ve seen some of my in-depth rants about music theory get this label applied to them. It seems like there are actually a lot more concerns than the ones available to be categorized; in particular, one related to the arts would be more meaningful than the current categories. I went and looked at the way my past entries have been analyzed, and couldn’t help but notice that it seemed to skip over “big” words and only counts the more common words. Moreover, for words that have multiple meanings, it picks a single one of them to count. For example, it thinks my use of the word “keyboard” is related to “work” when I’m actually talking about practicing on a piano keyboard.

I think the text analysis could be made more useful if it tried to discern the context of the words used before assigning meanings/concerns to them. This could be accomplished by detecting infrequently used words that have specific meanings, and then using that to extrapolate. For example, the words “keyboard” and “counterpoint” may have more than one meaning, but the words “tritone” and “harpsichord” are pretty specific to music so the text analyzer could properly interpret the surrounding words as also being related.

One problem I can see, though, is that a more sophisticated text analyzer might also require more processing power, which could lead to more server strain if enough people use it. Given the recent need to move servers, this is understandable! I still think it would be cool, though. Maybe make it a patron-only feature?

Stephanie Van Aken on Sat, Mar 26
