3
$\begingroup$

I'm trying to distribute Japanese alphabet characters for scrabble game.

Are there any pointers how to do this, the only thing which I know is to make more vowels than consonants.

Any ideas?

  • 0
    I mean hiragana2011-05-29

1 Answers 1

4

(I am not an expert, but I'm going off of a writing systems class I took in college.)

What you're looking for is the relative distributions of the characters. I'm going to guess that you're working with Hiragana, which is a moraic writing system – encoding units of sound in between phonemes and syllables. Trying to bring in Kanji would be problematic in a physical game.

In English, the letter frequency is known, and this then aligns with the distribution of tiles and the point values for each tile.

You'll want to do the same for Hiragana.

  1. Look for a large corpus of Japanese text (perhaps from literature, newspapers, etc.) – from a cursory glance, something like this may do.
  2. Convert all characters (kanji and katakana) to hiragana. I believe there are software tools to do this automatically.
  3. Run a frequency analysis (count how many of each character there is).
  4. Use the distribution to assign number of tiles to each mora. Round up/down. You probably want at least one of every mora.
  5. Also use the distribution to assign point values.

Now, some caveats (I'll add as I think of them):

  • Hiragana (a "moraic" writing system) has 48 mora, which means that there will be many more distinct tiles. This may mess with your tile distribution.
  • Some other issues are discussed in this thread.

Added I found a hiragana frequency table from this page on cryptography in hiragana. Their method was somewhat similar to ours, but using a couple of news articles rather than a corpus:

The table was constructed using the following steps.

  • The above news articles were translated into hiragana. This is here. This requires a dictionary and morphological analyzer. The morphological analyzer used was juman, which contains the necessary kanji dictionary.
  • Statistics were compiled counting individual hiragana characters.
  • 0
    @joriki: You're right, I forgot about that. I know that input methods of Mandarin Chinese using the phonetic pinyin use context to sort out the most probable characters that you're looking for. I assume that something similar could point out the most probable reading of the kanji by inspecting the context. You could store the probabilities, run through the corpus, and then output the expected number of times a mora was used.2011-05-29