As the Internet continues to change the news industry and the methods of production, circulation and consumption, it is ever more critical to understand the emerging trends and news outlets available online. Citizens must make daily choices about what sites to go to for various kinds of news information, but it is largely up to them to figure out which site can best fit their needs at the moment. And in many instances they may be making choices without fully understanding why.
The content analysis element of the 2007 Annual Report on the State of the News Media was designed to try to sort through the many different kinds of sites that offer news information. What do some sites emphasize over other things? Are there common tendencies? The creation of the study and the analysis of the findings was a multi-step process.
Sample Design and Web Site Capture
To assess the range of news Web sites available, we selected 38 different Web sites that provide such information. The sites were initially drawn from the seven media sectors that PEJ analyzes in each annual report:
In addition, we included one foreign broadcast site (BBC News) and the site of one wire service. (Due to the language barrier, Ethnic, non-English language Web sites were not included in the study.)
The result was the following list of sites:
Sites Studied
ABC News Com http://abcnews.go.com
BBC News http://news.bbc.co.uk
Benicia News http://www.benicianews.com
Boston Phoenix http://www.thephoenix.com
CBS11 TV http://cbs11tv.com
CBS News http://www.cbsnews.com
Chicago Sun Times http://www.suntimes.com
Crooks and Liars http://www.crooksandliars.com
Daily Kos http://www.dailykos.com
Des Moines Register http://www.desmoinesregister.com
Digg http://digg.com
Economist http://www.economist.com
Fox News http://www.foxnews.com
Global voices http://www.globalvoicesonline.org
King5 TV http://www.king5.com
Los Angeles Times http://www.latimes.com
Little Green Footballs http://www.littlegreenfootballs.com
Michelle Malkin http://www.michellemalkin.com
MSNBC http://www.msnbc.msn.com
AOL News http://news.aol.com
Google News http://news.google.com
Yahoo News http://news.yahoo.com
New York Post http://www.nypost.com
New York Times http://www.nytimes.com
Ohmynews.com http://english.ohmynews.com
PBS NewsHour http://www.pbs.org/newshour
Reuters http://www.reuters.com
Salon http://salon.com
San Francisco Bay Guardian http://www.sfbg.com
Slate http://slate.com
Time Magazine http://www.time.com
Topix http://www.topix.net
USA Today http://www.usatoday.com
Washington Post http://www.washingtonpost.com
The Week Magazine http://www.theweekmagazine.com
WTOP Radio http://www.wtop.com
Web sites were captured by a team of professional content coders. At each download, coders made an electronic and printed hard-copy of the homepages for each site as well as the top five news stories. Prominence was determined as follows:
The biggest headline at the top of the screen is the most prominent story. It may or may not have an image associated with it. The second-most prominent story is one that is attached to an image at the top of the screen, if that is a different story from the most prominent story. If there is no image at the top of the screen, (or there are two significant stories attached to the same image) refer then to the next-largest headline. To determine the next-most-prominent stories, refer first to the size of the headlines, and then the place (height) on the screen. If two stories have the same font size and are at the same height on the screen, then give the story on the left more prominence.
Stories were defined as:
Capture Timing
Web sites were initially studied from September 18 through October 6, 2006. For that initial review, each site was captured and coded four different times. For two captures, the research team coded for the entire set of variables, both the homepage analysis and the variables related to the content of news stories. The other two rounds of capture were coded only for the variables relating to the content of the lead stories.
Each site was then studied again during the week of February 12-16, 2007, and coded separately. Results for the two time periods were compared. In cases where features had changed, we closely examined the site again to confirm the change or correct inconsistencies. Final analyses were based on the confirmed February site scores.
Coding Scheme and Procedure
To create the coding scheme, we first worked to identify the different kinds of features available online ? everything from contacting the author to quickly finding just what you want to receiving your news free ? and how they could be measured. After several weeks of exploratory research, we identified 63 different quantitative measures and developed those into a working codebook (see list of primary variables below).
Coding was performed at the PEJ by a team of seven professional in-house coders, overseen by a senior researcher and a methodologist. Coders were trained on a standardized codebook that contained a dictionary of coding variables, operations definitions, measurement scales and detailed instructions and examples. The codebook was divided into two sections. The first was based on an inventory of the Web site?s homepage. That was performed three separate times ? twice in September, 2006, and once in February, 2007. The second component involved coding the content of news stories themselves. We included the top five stories for the variables related to the content of the news and took the average score for each variable.
Before coding began, coders were trained on the codebook. Excel coding sheets were designed and used consistently throughout the process. Meetings were held throughout to discuss questions, and where necessary additional captures took place to verify findings.
Coders followed a series of standardized rules for coding and quantifying Web site traits. Three variables deserve specific mention:
1. Multimedia components on the homepage: Coders counted all content items, defined as links to all material other than landing pages or indexes of some sort. Included were narrative text, still photos, interactive graphics, video, audio, live streams, live Q&A?s, polls, user-based blogs, podcast content and slide shows. Next, the coders tallied the total number of content items on the page as well as the totals for each media form and entered the percentages for each into the data base.
2. Advertisements: In counting advertisements on the homepage, coders included all ads, from obvious banners and flash advertisements to the smaller single-link sponsors of a site. Self-promotional ads were also included in the total. The idea of this variable was to estimate the economic agenda of a given site based on the amount of advertising on the homepage. Advertisements on internal pages were not included in the tally. Because of day-to-day variance in the total number of homepage ads, the final figure was either the average based on all the visits to a site or, in cases where a site redesign had clearly occurred, the latest use of ads.
3. Also in the Byline variable, blog posts required special rules. In counting bylines, for instance, researchers coded a blog entry as if the entry was posted by the blog host?John Amato on Crooks and Liars, for example. If the blog entry was posted by a regular contributor or staff, the ?story? scored a ?2.? And if the blog entry was posted by an outside contributor, not bylined, or consisted primarily of outside material (an entry, for instance, that simply said, ?Read this,? followed by an excerpt from another source), then the post received a score of ?3,? the lowest on the scale of original stories.
Analysis
In analyzing the data, we were able to group variables into six different areas of Web emphasis: User Customization, User Participation, Multimedia Use, Editorial Branding and Originality, Depth of Content and Revenue Streams.
Customization includes
Participation includes
Multimedia includes
Percent of homepage content devoted to:
Editorial Branding includes
Story Depth includes
Revenue Streams includes
Codes within each variable were translated into a numerical rating from low to high for that particular feature. Then PEJ research analysts produced an Excel template to tally the scores (summing the variables) for each site within the six categories. Thus for each of the six categories, each site had a final score. The range of scores was then divided into four quartiles and sites were marked according to which quartile they fell into.