The Good Patrons of 750 Words

A Note of Inspiration from Alex Bowe

1 cup
Japanese Tokenizing

I am learning Japanese so would like to write with that language. However, the language groups the words together: お元気ですか? (O genki desu ka?) is actually considered 4 words, but there are no spaces.

There is a javascript library called TinySegmenter.js (http://chasen.org/~taku/software/TinySegmenter/) which tokenizes Japanese (although your psychological analytics wont work, but thats okay).

Here is an example of how you might use it (although you might want to put language detection first, and filter space tokens/punctuation tokens/etc).

https://gist.github.com/alexbowe/8366628

New Feature Request Note from Alex Bowe on Fri, Jan 10

blog comments powered by Disqus