3
$\begingroup$

I'm trying to distribute Japanese alphabet characters for scrabble game.

Are there any pointers how to do this, the only thing which I know is to make more vowels than consonants.

Any ideas?

  • 0
    What do you mean by "Japanese alphabet characters"? As far as I'm aware, there are three scripts used to write Japanese, one ideographic and two syllabic; I don't know of any Japanese characters that are alphabetic in the sense of representing individual consonants.2011-05-29
  • 0
    I mean hiragana2011-05-29

1 Answers 1

4

(I am not an expert, but I'm going off of a writing systems class I took in college.)

What you're looking for is the relative distributions of the characters. I'm going to guess that you're working with Hiragana, which is a moraic writing system – encoding units of sound in between phonemes and syllables. Trying to bring in Kanji would be problematic in a physical game.

In English, the letter frequency is known, and this then aligns with the distribution of tiles and the point values for each tile.

You'll want to do the same for Hiragana.

  1. Look for a large corpus of Japanese text (perhaps from literature, newspapers, etc.) – from a cursory glance, something like this may do.
  2. Convert all characters (kanji and katakana) to hiragana. I believe there are software tools to do this automatically.
  3. Run a frequency analysis (count how many of each character there is).
  4. Use the distribution to assign number of tiles to each mora. Round up/down. You probably want at least one of every mora.
  5. Also use the distribution to assign point values.

Now, some caveats (I'll add as I think of them):

  • Hiragana (a "moraic" writing system) has 48 mora, which means that there will be many more distinct tiles. This may mess with your tile distribution.
  • Some other issues are discussed in this thread.

Added I found a hiragana frequency table from this page on cryptography in hiragana. Their method was somewhat similar to ours, but using a couple of news articles rather than a corpus:

The table was constructed using the following steps.

  • The above news articles were translated into hiragana. This is here. This requires a dictionary and morphological analyzer. The morphological analyzer used was juman, which contains the necessary kanji dictionary.
  • Statistics were compiled counting individual hiragana characters.
  • 0
    I wonder how accurate an automatic conversion of kanji to hiragana would be. Most kanji have different readings, so the conversion would need to have enough information about how they're being used to determine the right reading.2011-05-29
  • 0
    @joriki: You're right, I forgot about that. I know that input methods of Mandarin Chinese using the phonetic pinyin use context to sort out the most probable characters that you're looking for. I assume that something similar could point out the most probable reading of the kanji by inspecting the context. You could store the probabilities, run through the corpus, and then output the expected number of times a mora was used.2011-05-29