Showing posts with label Quantitative Research. Show all posts
Showing posts with label Quantitative Research. Show all posts

Saturday, August 30, 2008

In the digital age, Widener is "almost a museum"

Earlier this week, I was lamenting the state of research and the dissemination of knowledge in academia. Despite the incredible tools at the disposal of students, scholars, and professors, paper is still the medium of choice when it comes to publishing research and sharing knowledge.

I am not alone. This morning I was reading the September-October issue of Harvard Magazine, and spotted this quote from Venkatesh Narayanamurti, the outgoing dean of Harvard's School of Engineering and Applied Sciences:
"I believe that the liberal-arts education of the twenty-first century has to be different," he says, noting that information is no longer centered in Widener Library. "The library made Harvard -- we have always had the rarest things, the best repository of knowledge, [but] information now is digital; it is on the Web. Widener Library is very valuable, but it is almost a museum."
When it comes to publishing, the Extension School is also very much oriented toward paper. Theses are bound in buckram and end up on shelves in Grossman Library. They may never be seen or read outside of the university community.

I really hope to see the Extension School and other academic units at Harvard embrace digital publishing and other Web-based ways of distributing knowledge in the next few years, so our collective efforts can be truly shared with the world, rather than being restricted to the museums of paper that dominate the campus.Widener Library, photo by Ian Lamont

Update: I just found out that in February of 2008, Harvard's Faculty of Arts and Sciences approved a plan that will "post finished academic papers online free, unless scholars specifically decide to opt out of the open-access program." The source indicates that the policy applies to professors, but it's not clear whether student papers or research will be published online as well.

Tuesday, August 26, 2008

Thoughts on research, and saved by Scribd

When scholars from the year 2058 look back on the current state of academic research and the dissemination of knowledge, they surely will marvel at the fact that so much of it remained oriented toward printed words on paper.

It is a surprising situation. Never mind that nearly all educated members of early 21st century society are already familiar with the World Wide Web, the most extensive and accessible publishing and communications tool ever invented. Despite this, many facets of the academic world remain firmly planted in the ways of the early 20th century. Whether it's writing a term paper or conducting a major research project, the fruits of students' and scholars' efforts usually end up as printed sheets of paper destined for a professor's mailbox, a filing cabinet or a university library. Even a doctoral dissertation that takes years to complete is probably going to exist as a paper hard copy in just one or two locations. The insights contained in it may never be read by more than a handful of people.

This is not to suggest that academics are Luddites. Far from it -- most students and educators are very familiar with email, search engines, online databases, and Microsoft Word. But even if students use software programs to make and distribute a term paper or thesis proposal, electronic copies hardly ever venture beyond the hard drives of the students who created them, or the inboxes of the professors who received and graded them. On occasion, high-level research will be deemed good enough for a wider audience, but all too often these works remain restricted to books or journal articles that can only be seen in university libraries or expensive, password-protected databases. Fifty years from now, the scholars of the future will marvel at all of the ideas, hypotheses, evidence, analysis that were expressed but were only shared with a limited slice of humanity, despite the ubiquity of the Web and the many software tools at our disposal to share them with a much wider audience. This system will not only be viewed as inefficient, it will be regarded as isolating researchers from potential sources of knowledge and preventing them from making discoveries and improving our understanding of the world around us.

But there is hope. I have mentioned initiatives at MIT, Berkeley, and elsewhere that are attempting to leverage the power of the Web to spread knowledge more widely (see Online education, sharing knowledge, and a proposal for Harvard and UC Berkeley's free lectures on YouTube). Harvard Extended itself represents my own personal effort to share my experiences, observations, and research findings with a wider audience, and has succeeded beyond my wildest expectations -- Google Analytics tells me that more than 3,000 visits to Harvard Extended have taken place in the past 30 days, and nearly 85,000 visits have occurred since I first started using the tool in May of 2006.

Still, I want to do more. My blogging on Harvard Extended will come to an end in the next week, and it bothers me that the class papers I worked so hard on over the years do not have a permanent online home. Collectively, they took many hundreds of hours to research and write, and were shaped by my interactions with Extension School instructors, including members of Harvard's faculty. What a waste if they were to be resigned to a box of old papers in my basement, or a file directory on my hard drive. When I was still a student at the Extension School, I posted some of them to a fas.harvard.edu Web server. Unfortunately, I lost my FAS computing privileges when I graduated earlier this year, but I think I've found an alternate solution: Scribd.

Scribd is kind of like the YouTube of electronic documents. Registered users can upload their PDF or Word documents, PowerPoint presentations, and random .txt scribblings. Anyone with a Flash-enabled Web browser can view them, or even embed them on their own websites, just like you can do with YouTube videos. The database is searchable and indexed by Google, meaning that people anywhere can readily find specific documents, if they use the right search terms.

So, I've taken a half-dozen papers and uploaded them to Scribd. The idea is to share them with my readers on Harvard Extended, and anyone else who finds them interesting. I've linked them below, and embedded one of them in this post -- my final research paper for HUMA E-105 (Survey of Publishing, from Text to Hypertext). You can read them in your Web browser, or download a PDF copy, but I've disabled text and Word exports to discourage plagiarism. Here are the papers, starting with the research proposal I prepared as part of my proseminar back in the winter of 2003:

Defining a Territorial Sea: China's South China Sea Policy in the 1950s and its 1958 Declaration on the Territorial Sea (research proposal)
  • January 2004. Harvard DCE/SSCI E-100B (Graduate Research Methods and Scholarly Writing in Social Sciences), Joe and Doug Bond, Weatherhead Center for International Affairs, Harvard University
Historical Nationalism: How Interpretation of China's Past is Used to Build Unity in the Present
  • August 2004. Harvard DCE/Archaeology S-171 (Archaeology of the Silk Road), Irene Good, Peabody Museum, Harvard University
China's Emerging Overseas Chinese Policy in the Late 1970s and Implications for Ethnic Chinese Communities in Vietnam and Kampuchea
  • May 2005. Hist E-1834 (Chinese Emigration in Modern Times), Professor Philip Kuhn, Harvard University
Evaluating Official Attitudes Toward Post-Mao Chinese Film Through a Quantitative Lens
  • August 2006. History S-1855 (Film and History in Postwar Japan and Post-Mao China), Prof. Charles Hayford, Visiting Scholar, History, Northwestern University
Proposal for a Thesis in the Field of History in Partial Fulfillment of the Requirements for the Master of Liberal Arts Degree
  • February 2006. Prof. Donald Ostrowski and Prof. Alastair Iain Johnston, Harvard University
The Rise of the Press in Late Imperial China
  • November 2007. HUMA E-105 (Survey of Publishing, from Text to Hypertext), Matthew Battles, Museum of Fine Arts, Boston
I am also embedding the last paper I ever completed for the Extension School in January of this year, which was also for Battles' excellent survey class. It's quite fitting that it should end up here, as the class discussed the history of the written language from the time of the Sumerians through Gutenberg's printing revolution and finally the beginning of the current publishing revolution taking place on the Web. I took things a step further, and looked at emerging Web-based software technologies and photorealistic 3D environments. It's entitled Video, Computer-Generated Environments and the Future of the Web: One thing that's missing from this small collection of papers is the most important paper of my Extension School career: my thesis (title: Making a Case for Quantitative Research in the Study of Modern Chinese History: The New China News Agency and Chinese Policy Views of Vietnam, 1977-1993). There's are several reasons I have not included it here. While Scribd is a very easy way to host documents, one thing that Scribd does not have is a vetting process or a reputation for reliability. The contents of an academic journal will have been vetted by experts and editors, and quality will be high. On Scribd, anybody can publish anything without it being vetted by anyone, and quality is mixed. For academic papers published on Scribd, the good appears alongside the bad. You'll find astounding creative works and rigorously designed research projects, as well as limp efforts at scholarly writing and even deliberate misinformation. Users can flag offensive content and copyright violations, but the process is flawed and leaves a lot of bad content on Scribd's servers. Interesting or quality content can also be highlighted by readers and illuminated with comments, but this system is imprecise in that it does not differentiate the praise from a 15-year-old kid trying to finish his homework and a 60-year-old university professor who stumbles upon a great paper on Scribd through a search on Google.

You'll have to take my word that all of the above papers were submitted to Harvard faculty or Extension School instructors for review, and all received excellent grades. However, the weaknesses in the Scribd system have convinced me to hold off on reposting my thesis on scribd.com. I want it to have the largest possible impact on my field, and I don't believe it will have that impact if posted to Scribd. Instead, I am holding out hope for a Harvard-sponsored solution. Nearly two years ago, I petitioned the Extension School to archive masters theses in the same electronic database used for doctoral dissertations at Harvard, ProQuest UMI (Update: My thesis is now available through UMI/ProQuest). While this is a closed database that can only be accessed through university library systems, it is restricted to vetted, accepted research from university masters and doctoral programs. It is widely used in academic circles -- in fact, the literature review in my thesis referenced several dissertations that I had located in the ProQuest UMI database. I hope that someday my own thesis might also be useful to future scholars of modern Chinese history, Cold War history, and Chinese media studies, if Harvard decides to extend this resource to ALM theses from the Extension School.

Friday, March 28, 2008

The downside of picking a "hot" thesis topic

Chris over at the Mission Control ALM blog shares some of the challenges of picking a "hot" thesis topic. Chris' thesis will explore issues related to contractors in the Iraq war. While these issues have been in the headlines in recent months, and there's tons of journal articles, books, and primary sources to reference, the challenge is finding an original topic that hasn't been explored:
I see now that I picked a topic that had some research difficulties built into it. If you choose a subject that's current, hot, and much-written about, it's that much harder to sift through what's out there and find a thread that hasn't been worked on or needs more done on it. First, start with an avalanche of materials -- primary source and secondary. Then, skim through it and see what hasn't been covered. Next, come up with an idea that is both insightful and substantive, that can be explored, and for which you can find solid evidence.
I had a different sort of challenge for my thesis -- identifying existing research for an obscure topic, and whittling down the focus to something manageable and testable. Primary sources weren't a problem (I used thousands of articles from the Xinhua News Agency) but there wasn't much recent literature on the foreign policy issues involved -- in fact, the two sources that I tested my data against were a journal article from the late 1980s and a book from the early 1990s. And while the computer content analysis literature is quite extensive, I was unable to find any specific studies that were based on Xinhua's English-language service.

I suppose a safe middle ground between my approach and Chris' approach would be finding a topic that's not too hot, but has enough existing, recent literature behind that points to a clear path for new research.

But that would be too easy, wouldn't it?

Tuesday, February 26, 2008

A.B.T.

A few years back, I had a discussion with one of the ALM administrators about the graduation rate for the ALM program. He revealed that the all-time graduation rate was just 52% for the liberal arts concentrators (i.e., excluding the IT and professional ALMs such as journalism, biotechnology, management, and museum studies, which are not liberal arts degrees).

There are a couple of issues raised by this figure. First, while the graduation rate may seem unusually low, it's in line with the national average for graduate programs, says the Extension School. Second, it does not include the many people who take lots of graduate-level classes at the Extension School with the intention of officially entering the ALM program, but never matriculate, either because the proseminar is too difficult or they move away/stop classes before they have a chance to matriculate.

Of those students who do matriculate, but still don't graduate, there's an additional factor that comes into play: ABT status. They've completed all of the required coursework, including the proseminar, field courses, writing-intensive classes, and electives, and only have one hurdle to go: The thesis. Until they get that out of the way, they are "A.B.T.", or "all but thesis" (not to be confused with A.B.D., which refers to doctoral candidates who haven't completed their dissertations).

The thesis is what makes the master's program at the Extension School so special. It entails serious research that can take years to complete, and lets students work closely with some of the top academic experts in the world in their respective fields of study -- Harvard professors in the Faculty of Arts and Sciences, the Law School, the Kennedy School of Government, the Medical School, etc. The thesis goes beyond what many "traditional," full-time masters' programs require, including those at Harvard's other graduate schools (for instance, this Master of Education program at the Harvard Graduate School of Education only requires eight classes; there is no thesis). Approved ALM theses have been turned into journal articles, have been used as stepping stones for advanced degrees (at Harvard and elsewhere) and careers in academia.

As I've discussed many times on this blog, the ALM thesis is a huge challenge -- not just an intellectual challenge, but also a management challenge that requires tremendous organizational skills and lots of time.

It's also mostly self-directed. Students have to conduct the initial research inquiries, choose topics, and compose thesis proposals on their own, and follow the guidance of the research advisor and thesis director in terms of conducting additional research and developing the thesis itself. If a student procrastinates, fails to complete a certain step, or doesn't hear back from his or her TD in a timely manner, the thesis will die -- no one is going to do the work for the student, or constantly nag the professor on his or her behalf.

It therefore doesn't surprise me that so many ALM candidates fail to receive their degrees. Moving away or stopping classes are possible reasons, but I think the thesis requirement is a tough hurdle for many people. If a student is ABT for too long, his or her ALM candidacy will come to an end. Not only is there a five-year requirement for completing the degree, but also there is a nine-month window to write the thesis itself.

I've known two people who matriculated into the ALM program but never finished. Both were ABTs. The first was a Literature and Creative Writing concentrator who finished all of her coursework, and couldn't decide on a thesis topic. After a few years, she didn't really feel interested anymore. Later, she took another Harvard Extension School class relating to legal issues, and decided that this topic area was more intellectually rewarding. However, she moved away before she could take any more law or government classes, and the five-year limit eventually expired. The other ABT has also fought procrastination, but has an incredibly demanding job that places very real limits on the amount of time and effort that can be devoted to thesis research.

I can relate to both situations. If I had lost my passion for Chinese history, media, and computer-aided research, getting started on my thesis would have been difficult, and completing it would have been nearly impossible. And if I had my current job -- a new position that requires 10-hour workdays and frequent travel -- when I started my thesis research back in 2005, it's highly unlikely I could have completed it, without burning myself out or putting serious strains on my family.

Thursday, January 10, 2008

Dissertation horror stories

A fellow Harvard ALM blogger who operates the Mission Control blog writes an interesting aside at the end of a recent post, relating to doctoral dissertations that went horribly awry -- like the graduate student who discovered that his "great idea was completely wrong, no such thing existed," but was still forced to run with it by his advisor. Read the reason why, and a few other PhD tales of woe, at the end of this post.

Related entries:

Sunday, December 30, 2007

Generation G in Taiwan: Age gaps in Internet usage and blogging

On The Digital Media Machine blog, I recently discussed Generation G -- the under-40s who belong to the video game generation. I wrote:
Most people in this demographic grew up with games, and many of them still play now. They are familiar with gaming conventions relating to movement, exploration, cooperation, competition, and communication. Additionally, interaction with video games from an early age has created a foundation of familiarity and interest in computing technologies.
While I noted that more than 80 million people in the United States belong to this demographic, I did not get into the international dimension. According to the U.S. Census Bureau, there were nearly 4.5 billion people under the age of 40 as of mid-2007. Obviously, many of those in developing countries may never have seen a video game console or touched a computer, but in other countries parts of Europe and Asia, video games, computers, and the Internet are a way of life for people in this age group.

United Daily News articleThe ESWN blog found a report that supports the Generation G hypothesis in Taiwan. The United Daily News (lian he bao, 聯合報) reported the results of a telephone survey of 15,007 people from all over Taiwan that polled them on their 'Net habits, and broke down the results by age. The inset graphic is from the United Daily News website, and shows the data. Not surprisingly, almost 100% of the youngest bracket (aged 12 to 20) were Internet users. Most of the 21-30 and 31-40 groups were also online. But there was a steep dropoff from the 30-somethings to the 40-somethings, and just over one in five of the over-50s were online:
Age 12-20: 99.8%
Age 21-30: 94.4%
Age 31-40: 84.2%
Age 41-50: 58.6%
Age 51+: 21.9%
The survey also asked about blogging, and I was quite surprised to see how active Taiwan's teenagers were in this respect: Nearly half of the 12-20 year olds said they blog, and about 30% of 20-somethings do the same. 30-somethings in Taiwan are far less likely to blog, with just 12.5% saying that they maintain one. This matches with my own experience -- most of my Taiwanese friends are in their 30s and 40s, and I only know one who has a blog.

Wednesday, December 26, 2007

Research advice

Some solid advice for grad students considering major academic research projects:
1) Use graduate school to tech up. You'll have time to learn how save the world later, when you're actually in it. Learn all of the theoretical, statistical and other difficult-to-acquire skills you can while in grad school, because you won't have the time later on. You, your cause, and your job prospects will be well-served by the technical skills you build.

2) Hang in there. In the first year of any grad program you will encounter a lot of required material that will feel too theoretical, too divorced from social change, and (occasionally) like too much nonsense. Much of it is good for you (see point 1), even if it doesn't feel like it at the time. After a year of metrics and micro theory, I was ready to run to the real world to do what I thought I really wanted to do. The best advice I ever got (from one of my pre-PhD advisers) was, "Shut up and hang in there; by your second to third year you will discover all the people doing interesting applied work soon enough and be free to work on whatever you want by your third year." He was right.

3) Take chances. The second best piece of advice I ever received came from my dissertation chair, shortly after my oral examinations committee told me that my prospectus was poorly thought out, uneconomic, and overly risky. They were 100% right, and I benefitted from hearing it (although at the time I was miserable). Where I think they were wrong is that they told me to abandon my plans for risky and expensive field work. They favored the less risky route that could get me to a completed dissertation faster. My chair's response: "Hey, if you really want to do this, why not? Give it a shot. If it doesn't pan out after three months, then come back and work on something else. Worst case scenario: you lose a few thousand dollars and a summer, but you have a great experience." I plan to give the same advice to my students.

4) But minimize your risks by being prepared. Don't embark on a big project, especially field work, without a solid hypothesis, research design, and plan. Think through the theory beforehand. Write down your assumptions, your logic, and your econometric regressions before you collect data. Especially write out your regressions. I am still guilty of rushing to the field too quickly, and am continually reminded of the costs.
The author (Chris Blattman, an assistant professor of political science and economics at Yale) has six additional pieces of advice. The post is intended for people considering economics-related research as part of a PhD program, but some of the tips can be applied to what ALM students are doing at the Extension School. Tip #4, above, seems especially relevant -- in my proseminar and in the ALM thesis writers' workshops, other students would sometimes propose complicated paper-based surveys, or ask extremely broad research questions (e.g., "does religion encourage war?"). Fortunately, the ALM program has processes intended to help candidates find realistic, solid research topics -- namely, the proseminar and the thesis proposal.

(Thanks to Greg Mankiw for the link)

Related entries:

Friday, November 16, 2007

Strange database queries at the Harvard Business School, and a new model for downloading online music

My Computerworld blog was recently updated with some new functionality and a new name -- The Digital Media Machine. The focus is mostly new media technologies, ranging from the Internet to virtual reality, but I also touch on some subjects that might be interesting to my Harvard Extended audience. For instance, yesterday I blogged about a strange incident at the Harvard Business School's Baker Library:
The Crimson, the student-run newspaper at Harvard, has a report of an unusual incident in a campus library. Administrators at the Harvard Business School library were forced to block a user's IP address from accessing Factiva, an online database of news articles and other text documents, after determining that the user had downloaded millions of articles in the span of a few months.
It turns out that the user in question was (probably) building a very large census of news articles and other text documents for a computer content analysis, using a script that scraped the articles from the password-protected Factiva database. I can totally sympathize: As some of you may remember, I also carried out a very extensive content analysis of news articles from China's Xinhua News Agency for my thesis, and was frustrated by the manual processes involved in getting samples and query results from LexisNexis Academic.

There's another new post on my Computerworld blog that might interest anyone who pays for online music: A proposal to revamp the per-song and per-album pricing model from a flat fee (e.g., iTune's 99 cents/song scheme) to a scaled pricing model that actually evaluates whether or not you value the song.

Friday, November 09, 2007

Hedge fund uses Harvard Extension School distance education class as backup training

Spotted in my Google blog search RSS feed: David Kane of Kane Capital Management -- a company that operates a hedge fund -- requires summer interns from Williams College to have a solid grounding in statistics. If they are unable to take the appropriate course at Williams, he has them take Government E-2001 ("a course [that] gives you the tools to build statistical models and useful in real social science research") through the Extension School's distance education offerings. He pays, too.

This is one example of how distance education at Harvard has potential applications beyond enabling Extension School students to take coursework online. Some of these high-quality classes can be used for workforce education in certain fields. Conceivably, these classes could also be used as substitute for-credit at other colleges or universities that don't offer such courses, offer them infrequently, or need to serve students who are not on campus because of a disability, military service, overseas study, etc.

The class that Kane refers to looks quite interesting. While Government E-2001 is "recommended" for government concentrators, it is not required, and I have the feeling that a lot of Extension School students aiming for a government ALM shy away from taking it, considering they already have one difficult requirement to get out of the way (the graduate proseminar) and the fact it involves a subject that so many social sciences concentrators dread -- math. In my experience, very few people who are ALM Government or History concentrators like math or attempt to use quantitative methodologies in their theses. Others may not realize until it is too late that they want to use statistical analysis, instead of more traditional qualitative approaches.


In hindsight, I wish I'd taken this class (or one like it) before I started my thesis, which used a quantitative methodology to study Chinese foreign policy during the Deng Xiaoping era. While I had studied computer content analysis schemes during my graduate proseminar in 2003, I didn't have any training in statistics when I started my research in 2005 -- I basically had to do a lot of extra reading on my own, and get advice from my thesis director and a few others in order to develop my models and analyze the data.

Monday, September 17, 2007

Thesis update: Done!

I can't believe it. I'm done. Done! Here's how it went down. My thesis director, Professor Alastair Iain Johnston, signed off in the late spring. I then had to go through a few formatting revisions with my Extension School research advisor, Dr. Donald Ostrowski. That took the better part of the summer, in part because he has a lot of other work to do, and I was in Asia for State of Play and a family trip to Taiwan for two weeks in August. Finally, I got back the paper drafts at the beginning of September. I input the few remaining corrections -- adding a few missing periods in footnotes, and correcting some minor spacing issues -- and then sent the PDF file to Wells Bindery in Waltham, one of the binderies that Harvard uses for thesis and dissertation work. The hard copies won't be ready for a few weeks, but the content is final. Here's the complete text of the final grade report, written by Prof. Johnston:
This is a first-rate thesis. Lamont worked extraordinarily hard to develop a range of sophisticated quantitative content analysis methods in order to test their usefulness in adjudicated academic debates about the nature of Chinese foreign policy. In particular he used these methods to test whether Chinese diplomacy toward Vietnam from the late 1970s to the end of the Cold War was based on anti-Soviet motivations or based on distinct concerns about Vietnamese influence in Southeast Asia. Lamont did an excellent job of seeking out and using information about content analysis techniques from a number of top experts on content analysis at Harvard and elsewhere. he showed a great deal of creativity in playing around and perfecting the methods and he also demonstrated acute sensitivity to the analytical downsides of these methods.
I've temporarily archived it at the following location: Making a Case for Quantitative Research in the Study of Modern Chinese History: The New China News Agency and Chinese Policy Views of Vietnam, 1977-1993 I hope to identify a permanent electronic archiving solution in the next few months (Update: The thesis is now available through UMI/ProQuest). A bound copy will be sent to the Extension School as well, and I assume it will either be filed in Grossman Library or the Archives. There are a lot of people who I'd like to acknowledge here, in addition to Prof. Johnston and Dr. Ostrowski. Here are some brief summaries of how they contributed to my research: Drs. Doug and Joe Bond of the Weatherhead Center for International Affairs: The "Bond brothers" taught my graduate proseminar in 2003, and introduced me to modern mass media content analysis techniques. Sally Hadden, Associate Professor of History and Courtesy Professor of Law at Florida State University and a Harvard Summer School instructor in the history of the Old South: She taught me how to prepare high-quality précis, which have been hugely useful in documenting the literature used in my research and cited in my thesis. Philip Kuhn, the Francis Lee Higginson Professor of History and of East Asian Languages and Civilizations: I took two classes with Prof. Kuhn that relate to modern Chinese history (China in Modern Times in 2003, and Modern Chinese Emigration in 2005), and he was the first Harvard instructor to evaluate a CCA that I had designed on my own based on NCNA/Xinhua data. Will Lowe, formerly of the Weatherhead Institute's Identity Project, and now of the University of Nottingham: I never spoke with Will in person, but I have communicated with him by email several times. His free, open-source text analysis program, Yoshikoder, was one of the three software tools that proved instrumental to my research (the others were Excel and LexisNexis Academic, but I don't know who to thank for those!) There are two other constituencies I'd like to thank here. One is my family, including my parents. But my wife deserves an extra-special thanks. I'll excerpt from the dedication that appears on page viii of my thesis, which sums up the sense of appreciation -- and love -- I have for her:
I would like to thank my wife Nicole, who has been the most patient and supportive witness to my academic journey over the past four years. There have been hundreds of nights and weekends that I have spent in my study, conducting research or writing, time that I otherwise could have spent with her and our two small children, yet she never once protested. I hope that I can reciprocate some day, but in the meantime, I would like to dedicate this thesis to her.
Lastly, I'd like to thank all of you. When I started Harvard Extended back in May of 2005, I had no idea that it would attract so much interest: The hundreds of pages on this blog have been viewed more than 100,000 times (my counter reads 85,891, but I didn't activate it until April 2006, nearly one year after I started it). Thousands of people have seen it. Many have been drive-bys or lurkers, but some of you have left comments or sent emails to give support. I've even met a few of you in person. The words of encouragement have been important, but knowing that I have had this audience has been a strong motivator as well. I would have done the thesis without the blog, but I probably would have been much slower if it hadn't been for all of you looking over my virtual shoulder. Regular updates were required, and this really forced me to consider the progress of my thesis and research on a weekly or monthly basis, and plan the next steps, even if I wanted to procrastinate or take a break from my studies. So thank you, thank you, thank you! I'll continue to maintain this blog for my next (and last) Extension School class: Survey of Publishing: From Text to Hypertext. It starts tomorrow. Stay tuned!

Friday, September 07, 2007

Quick Yoshikoder/General Inquirer update

In August, I described how I was constructing a modified General Inquirer negative dictionary to use with Yoshikoder, in order to perform a computer content analysis of press coverage of Second Life. I actually published the results on one of my other blogs, I, Lamont:
So, what does the data mean? The BW articles that were published in the latter part of 2006 generally had a lower percentage of negative terms than those published in the first four months of 2007. This agrees with the anecdotal observations by myself and a few other sources that BW hyped Second Life in late 2006.

However, the negative rates from the early part of 2006 were surprisingly high. In May 2006, the rate approached 5%, and that was the same month BusinessWeek made the famous pronouncement that "Virtual worlds abound in useful business applications." The analysis suggests that there was actually a stronger negative thread running through the BW coverage during this time, although that apparently dropped away during the summer, when the negative rate dipped to about 2.5% in August.
There are more data points, an Excel chart, and some notes about why I think the quality of the Yoshikoder-derived data quality is suspect over on my other blog.

Tuesday, September 04, 2007

Back from State of Play V: Conference recap

For the past two weeks, I've been meaning to write about the State of Play V conference in Singapore. I gave a brief report about the opening night's entertainment (a documentary about Second Life) but I also wanted to talk about what happened over the following two days of the conference. It was the first time I attended State of Play, and it really was an eye-opening experience for me.

I only become aware of the extensive academic interest in virtual worlds relatively recently, through my Terra Nova experience, and reading Edward Castronova's Synthetic Worlds, R.L. Taylor's Play Between Worlds, and Nick Yee's MMORPG research. Many virtual world researchers were on hand to discuss their work in Singapore. My panel included Henrik Bennetsen, a Stanford researcher who has spent the better part of six months inside Second Life; Aleks Krotoski, a Guardian columnist and University of Surrey PhD candidate who is studying social networks and online social influence; and anthropologist Thomas Malaby, an associate professor at the University of Wisconsin-Milwaukee who is in the midst of writing an ethnography of Linden Lab and its relationship to Second Life. I also met Ted Tschang, a Singapore Management University professor who has conducted some very interesting research into video game development.

The panel went well. It was entitled "Understanding Virtual World Inhabitants", and was described as follows:
As the virtual world landscape matures, industry and academic researchers are developing systematic methods of measuring user behaviors and understanding resident attitudes. This panel explores the value of quantitative and qualitative approaches to such investigations.
SoP V co-organizer Dan Hunter led the panel, which was in presentation format with a Q&A at the end. The others gave recaps of their respective research methodologies. I talked about the qualitative and quantitative approaches used by journalists, speaking from my perspective as a Computerworld editor and graduate student conducting media-related research at the Harvard Extension School. My main points: There is some stellar coverage relating to virtual worlds in the popular press and industry publications (I pointed to Wired and the New Yorker's Will Wright interview), but for the most part, journalists are quite limited in terms of the amount of time they can spend conducting research, restrictions relating to length and editorial focus, and problems finding and using quantitative research. Sensationalism, generalization, and poor use of statistical data are problems in many countries. I was able to give several examples from the American, Chinese, and Taiwanese media.

I concluded that the news media will play a major role in shaping the attitudes and understanding of the 90+ percent of the world's population that currently has no concept of social or gaming virtual worlds. I also revealed the results of some database searches I conducted, which support this conclusion: According to LexisNexis Academic, the number of references to "virtual world" or "virtual worlds" in "major US and world publications" (consisting of English-language newspapers and magazines from all parts of the world) has trended as follows, over the last three years:

July 2005: 45 results
July 2006: 81 results
July 2007: 199 results

I also searched Factiva for 虚拟世界 (xu1ni3shi4jie4), the simplified Chinese for "virtual world") in all languages, all companies, and all regions (which indexed results from publications in China, plus a few in Hong Kong and Singapore), and came up with the following numbers:

All of 2004: 271 results
All of 2005: 553 results
All of 2006: 624 results
2007 to June 30: 472 results

Assuming that the higher numbers reflect increased coverage, as opposed to the databases including more news sources, the data indicates that more people are indeed being exposed to virtual world-related concepts through the mass media. It will be interesting to see how their perspectives of virtual worlds and acceptable behavior in these worlds is shaped by what they see in the news in the years to come.

Besides the academics, State of Play V had large legal and industry contingents. The legal focus should come as no surprise, considering the history of the conference and its organizers, which include the Harvard Law School's Berkman Center, Yale Law School, and New York Law School. The industry representation was dominated by people and companies working with social virtual worlds -- Second Life, There.com, HiPihi -- as well as several marketing and consulting firms. I've already talked about There.com on Computerworld, and hope to discuss HiPiHi on a later post here or on my Computerworld blog.

There are also supposed to be "video timecapsules" posted to the SoP V website at some future date. Henrik and I taped an interesting, half-hour discussion about Second Life, emerging software and hardware technologies, and issues relating to media coverage of virtual worlds. I'll post a link when it goes online.

Many thanks to Dan Hunter, Aaron Delwiche, and the staff of Harvard's Berkman Center for making my trip to Singapore possible!

Tuesday, August 28, 2007

When statistical analysis gets scary

Statistical analysis finds evidence of human-to-human bird flu transmission, reports the Fred Hutchinson Cancer Research Center:
The researchers based their findings on a cluster of eight flu cases within an extended family in northern Sumatra. Using a computerized disease-transmission model that took into account the number of infected cases, the number of people potentially exposed, the viral-incubation period and other parameters, the researchers produced the first statistical confirmation of humans contracting the disease from each other rather than from infected birds.

The cluster contained a chain of infection that involved a 10-year-old boy who probably caught the virus from his 37-year-old aunt, who had been exposed to dead poultry and chicken feces, the presumed source of infection. The boy then probably passed the virus to his father. The possibility that the boy infected his father was supported by genetic sequencing data. Other person-to-person transmissions in the cluster are backed up with statistical data. All but one of the flu victims died, and all had had sustained close contact with other ill family members prior to getting sick -- a factor considered crucial for transmission of this particular flu strain.
The close cousin of this type of research and analysis is predictive analytics -- and I find it somewhat alarming that a Google search for the following terms:

predictive analytics disease

.... turns up 209,000 English pages.

Sunday, August 05, 2007

Preparing the General Inquirer negative dictionary for Yoshikoder

I've been spending my free moments this weekend creating a Yoshikoder-friendly version of the General Inquirer negative dictionary used for computer content analysis of political texts. It entails adding wildcards, which Yoshikoder recognizes. This means that the dictionary will be far more sensitive to variations of common negative terms. The creators of the GI dictionary got some variants -- for instance, "exasperate" and "exasperation" -- but missed many other obvious ones, such as "exasperates" and "exasperating". Using "exasper*" will catch these terms.

Of course, wildcards don't work for every word. For instance, "envy" and "envious" could be replaced by "env*", which would get variants such as "envies," but would also catch unrelated words with neutral or even positive meanings -- "envelope," "envision", etc. In this case, I simply added "envies" to the list, rather than using a wildcard.

Converting the General Inquirer dictionary is no easy task. There are 2000 words in the original dictionary that my thesis director gave me (although I see another version contains 2291 words), and each one requires manual review to ensure that wildcards are effectively used and don't introduce unwanted terms into the content analysis that I am planning -- a review of press coverage of Second Life in the past 18 months. Although the GI dictionaries were originally created to examine political texts, I believe they can be used to evaluate other types of text content as well. The GI negative dictionary doesn't contain some of the terms that one typically sees in American or British media articles about new technologies, but it does have a very solid baseline list of negative terms that one might see anywhere.

To see how I used Yoshikoder for my thesis research, check out the following posts:


Thesis update: One small step completed, but still a long way to go

Thesis update: Revising proposal, going granular with Yoshikoder

Thesis update: A eureka moment

Thesis update: Chapter 3 (draft) completed

Tuesday, June 12, 2007

Who says data visualization has to be boring?

Entertainment visualization from Earl BoykinsThis should appeal to anyone who works with quantitative research and is a music fan: It's a blog devoted to entertainment-related data visualizations.

Emo+Beer=Busted Career hard to describe unless you visit the site, but some of the examples include color-coding newspaper reviews of pop music according to whether the sentences within the review were informative, positive, negative, or jokes. Another applies colored bubbles of various sizes that describe the blogger's Lil Wayne tape collection.

The blogger in question is named Andrew Kuo ("earl boykins"), and he even got a write-up in the New York Times, which also supplied a summary of visualizations that apply to one of Kuo's favorite artists, Bright Eyes (Conor Oberst). See some of the charts which the Times gathered, such as "Quality Arc of Each Show," "Number of Times Certain Entertaining Phrases Were Shouted," and "Number of People Onstage at the End of the Encore"

Sunday, June 10, 2007

UMass Boston and bias in the Boston Globe, continued

I wanted to post a brief follow-up to an issue that came up last year, when a student group at the University of Massachusetts, Boston, claimed bias in the Boston Globe. The story was picked up by Universal Hub, and drew a lot of community comments. The group used stats from LexisNexis to support their case, claiming that "the Boston Globe has an established pattern of seriously underreporting events at UMass Boston--a large public college--when compared to its generous coverage of large private colleges in the Boston area."

As regular readers of this blog know, I have used LexisNexis in my thesis research and class papers. I collected additional information from LexisNexis that better described the patterns of Boston Globe coverage relating to UMass Boston and several other colleges in the area, including Harvard. The data supported the UMass group's claim of bias/favoritism toward different colleges and universities in the area. You can see the results here.

At the time, I anticipated that the public shaming of the Globe would encourage it to pay more attention to the news, events, and research coming out of UMass Boston. So far, that hasn't happened. This afternoon, I did some more Lexis Nexis searches, comparing references to UMass Boston and MIT, and found that coverage of UMass Boston has actually declined in the Globe, when compared with earlier years. (Harvard, BC, BU, and Northeastern were left out of this survey, owing to the tendency of sports-related happenings to inflate the results).

Here's what I found, when I compared the January 1st to June 11th UMass and MIT coverage for each year, starting in 2004:

MIT or "Massachusetts Institute of technology" in the headline

2004: 43
2005: 32
2006: 26
2007: 32

UMass in headline, and "University of Massachusetts at Boston" or "UMass Boston" in the full text, but not Dartmouth or Lowell or Amherst in full text (this allows for the short version of UMass in headline, which copy editors and editors prefer, but the article is about UMass Boston, as opposed to UMass Amherst, UMass Lowell, and UMass Dartmouth)

2004: 5
2005: 19
2006: 7
2007: 2

So, while coverage of MIT has actually increased when compared with the same period last year, and is comparable to the 2005 levels, articles about UMass Boston are not only rare, they are considerably less common than last year's levels.

Of course, it can be argued that this is not a true measure of UMass Boston-related coverage, as I am excluding all of the articles that may also cite the other University of Masschusetts campuses. Furthermore, there may have been some UMass-related news that did not mention the school's name in the headline.

Others may point out that this is not a fair comparison -- MIT is one of the largest research universities in the United States, and is involved with several high-profile national initiatives that generate lots of news, such as the Broad Institute.

But even if you exclude consideration of MIT, and just concentrate on the results for UMass Boston, the trend is clear: The city's largest newspaper has reduced coverage of a major public university on its own doorstep.

Incidentally, I was prompted to look into this issue by an interesting discussion taking place on the Extension Student discussion forum. I started the thread after seeing two of Harvard's own publications -- the Harvard Gazette (a publication of the Harvard News Office, I believe) and the Harvard Crimson (A Harvard College-oriented newspaper) placing the Harvard Extension School last in Commencement-related articles.

Update:


Uncomfortable with the possibility that certain types of articles about UMass Boston were being excluded because of no mention of the school in the headline, I expanded the search criteria to include references to "University of Massachusetts at Boston" or "UMass Boston" in the headline *or* lead paragraph. The results for the year-on-year searches from January 1 to June 11 since 2004:

2004: 30
2005: 52
2006: 22
2007: 24

The results for MIT or "Massachusetts Institute of technology" in the headline or lead paragraph:

2004: 179
2005: 166
2006: 187
2007: 158

The problem with including the lead paragraph results is that many articles are not specifically about the university in question. For instance, one of the 2007 articles that mentioned UMass Boston in the lead paragraph was an editorial that only mentioned the school because of the UMass Boston Commencement speech by Gov. Patrick. The real focus of the editorial was community colleges across Massachusetts.

Thursday, May 31, 2007

ALM Thesis Forum recap

The ALM Thesis Forum wrapped up this week. I spoke on the second night (social sciences), but I also attended the first night, which featured the creative writing and literature concentrators. I was unfortunately unable to attend the third night of the forum (IT concentrators).

What was so special about the forum? For everyone in the audience, the thesis forum was a chance to see results of the candidates' research and learn about the different methodologies, processes, obstacles, and special opportunities that came into play. For me, it was the opportunity to share my research with a new audience and meet the other students who have been going through the same challenges I have over the past two years. The fact that three other social sciences theses were China-focused also made it special (but I am kind of biased on that point!)

For those readers who are in the early stages of their thesis research, consider registering for next year's forum when the notice goes out. The format is casual -- it's really up to the individual speakers to decide what they want to present and how they want to present it. Some people read from a script, while others talked from memory or used powerpoint slides as a reference. Many people had visuals to share, such as charts and photographs. Most presentations were about 15-20 minutes long, with an additional five-minute question and answer session. The audience was receptive, and not too large -- I'd say about 50 people showed up on the first two nights.

For students who have recently started the ALM program, and haven't really planned their theses, the forum is a great opportunity to learn about some of the research possibilities and various ways to approach your research questions.

Many thanks to HESA and the Harvard Extension School for putting this together!

Monday, May 21, 2007

Counterfactual reasoning, gaming, and Niall Ferguson

In 2003, during my ALM proseminar at the Harvard Extension School, the instructors (Doug and Joe Bond, of Harvard's Weatherhead Center) introduced the class to a fascinating research technique: Counterfactual reasoning.

How does it work? Simply put, using "counterfactuals" to study history or government policy entails applying an alternate reality scenario to a real-world situation. Then, using logic or knowledge of other issues, researchers can evaluate what factors were most important to the real-world situation.

For instance, counterfactual reasoning based on the following scenario allows scholars to evaluate President Kennedy's role in determining U.S. policy in Vietnam in the 1960s:

"Had Kennedy not been assassinated, would the United States have escalated its military involvement in Vietnam in the mid-1960s?"

Authors and scriptwriters have had a field day with counterfactuals, often in the form of "alternate world" fiction, such as Robert Harris' Fatherland (what if Germany had defeated Britain, and Hitler had lived?).

Some strategy-based videogames also incorporate counterfactual scenarios. And it turns out there is academic interest in such games. Wired's Clive Thompson that Harvard historian and counterfactual researcher Niall Ferguson was so fascinated by Making History, "a game where players run World War II scenarios based on exhaustively researched economic realities of the period" that he is helping advise its creator, game studio Muzzy Lane, on a new game series that will let players "model modern, real-world conflicts such as Iraq, Afghanistan and the nuclear confrontation with Iran."

Counterfactual games also tie in with the "Serious Games" movement, as reported by Serious Games Source.

Sunday, April 22, 2007

Content analysis advice

I came across a great quote by Philip Stone, the author of the General Inquirer content analysis program:
"Doing content analysis by hand will reduce even the most fanatical post-modernist to pleading for a computer."
The source of this quote is a very good overview of CCA/CATA techniques.

Also, this is the 300th post on the Harvard Extended blog in nearly two years. On average, I have been posting about once every two days. If I have the time, I'll compose a "best of" post in the few weeks, to point new readers to some of the more interesting pieces. In the meantime, the best way to navigate this blog is to follow the topic tags at the bottom of each post, which will take you to a reverse-chron list of all of the posts in that particular topic.

Thursday, April 19, 2007

New media projects: Web video and Tech Dispenser

One of the best aspects of my current job at Computerworld is that I am often involved in many new media projects for the Computerworld website. I was hired in early 2005 to get Computerworld's blogging area and editorial webcasts off the ground, but since then I have also helped develop Computerworld's podcasts as well as user-generated/community content offerings such as Shark Bait. Currently, I am involved in two new editorial initiatives: Web video programming and Tech Dispenser.

The event video is a sponsored daily recap and interview show from one of Computerworld's largest events, Storage Networking World. The event ended this morning, and we created three episodes in all. The second episode will give you a good idea of what the program is about in terms of the format and content:

SNW In Focus, day 2: Mini SANs and the UK's fastest supercomputer

This is not the first time I've worked with video -- my first job in journalism, back in the mid-1990s, was working as a newswriter and narrator for an English-language TV program at the China Television Company (CTV, 中國電視公司 or 中視 in Chinese). Then, as now, producing professional-quality video programming was an extremely labor-intensive effort, requiring a great deal of teamwork and coordination. I know there is a big movement toward vlogging and user-submitted video content a la YouTube, but anyone interested in sponsoring or creating professional, TV-quality content should know that it's not as easy as it looks. My colleague Lucas Mearian and I were the faces of Computerworld Events: SNW In Focus, but behind the scenes there was a video production crew working full time, not to mention additional help from editors and online production staff back at Computerworld HQ in Framingham. Despite all of the work required, we consider the program a success, and look forward to doing similar editorial video programs in the future.

The second new initiative that I am involved with at Computerworld is Tech Dispenser. This is an editor-driven blogs aggregator that I think could be a useful alternative to traditional, algorithm-driven aggregators like Megite.

Wait a second, you say: Aren't you the same guy whose research interests include advanced data-mining technologies and computer content analysis? What have you got against algorithms and existing aggregators that highlight interesting news and blog content in the giant, semi-structured database that is the Web?

My answer to that is best summed up on a recent post on my Computerworld blog:
The aggregators are extremely efficient in finding and highlighting news or topics of discussion, but there is a flaw that all share: An inability to identify quality content. Computers are good at counting the number of links pointing to a specific blog post, or measuring the number of topical keywords in a news article. But they are incapable of spotting a scoop, let alone an elegant analysis of a technology trend. Hence, we see lots of highlighted articles and blog posts on the aggregator sites that are simply repeating what someone else has already said, or weak writing samples that are a waste of readers' time. A few sites use deceptive SEO techniques and other weaknesses in the algorithms to manipulate the aggregators to get their articles or posts to the top positions, and on several occasions I have seen suspected astroturfing campaigns highlighted on the blog aggregators.
Tech Dispenser hasn't launched yet, but I am interested in seeing how the tech blog community reacts to the idea. Besides driving traffic to blogs that participate in the Tech Dispenser blog network, the site also includes a revenue share model that should appeal to many tech blog writers.