I am trying to see if people pronounce one kind of word (say nouns like "cat") with a longer duration than another kind (say plural nouns like "cats"). I had $18$ people pronounce words from lists of $20$ word pairs. I would like to apply either a paired t test or a wilcoxon signed rank test.
The question is: Should I use each person's average duration for nouns and compare that to their average duration for plural nouns (meaning that I would be comparing $18$ pairs of words)? Is it possible to just take the raw data (meaning that I would compare $360$ pairs of words- $18\times20$)? Which is the preferred approach?