Monday, November 06, 2006

Thesis update: Draft chapter 2 completed; problems testing for statistical significance

I finished the draft of chapter 2 (Methodology) on Sunday and sent it off to my thesis director, along with five appendices. This totalled about 25 double-spaced pages in all, and took about one week and eight iterations to complete. A link to the draft of chapter 1, which introduces the topic and the literature, can be accessed from this post.

Next up: presentation of data. And this is going to be tricky. Because while I have lots of data to present, only a small amount directly applies to my tests. A lot of the data will therefore be distilled into percentages and ratios presented in easy-to-read charts, rather than the raw datasets in my giant Excel document.

But one piece of data that I am really agonizing over right now is statistical significance for the Yoshikoder-generated ata. I am not the first historian or media scholar to run full speed into a brick wall representing the application of complicated statistical tests, and certainly won't be the last. I just spent the last three hours going over an old precis from my proseminar (Earl Babbie's The Practice of Social Research) and various websites, evaluating different statistical methods that might provide more understanding as to the quality of the data that I have collected. My father suggested Student's t-test. Babbie described regression analysis. I also took a look at this Chi Square tutorial. Another site pointed to the importance of two-tailed significance tests.

One problem with the Yoshikoder data that quickly became apparent was the fact not all years and "sample types" used random sampling, which is required for Student's t-test and Chi Squared. If the number of relevant news items of a certain type was between five and ten, I used them all -- i.e., no random sampling. Even for the sample types/years that did have enough news items (more than 10) to create a random sample of ten news items using the Research Randomizer tool), the resulting yearly samples were saved into a single text file for Yoshikoder analysis, rather than separately analyzed and the results recorded in the spreadsheet. Concatenation of the samples saved lots of time, but I believe it eliminates the possibility of applying the tests of statistical significance listed above. (Or does it? People with statistics expertise are welcome to comment below or email me.)

This shortcoming will be acknowledged in the "research limitations" section of the thesis, in the second draft.

No comments: