Verge Studios Blog about Graphic Design, Web Design, and Joomla

Defining 'Intentional Aboutness' in Web Documents Using the Open Graph Protocol

In Western society’s endless pursuit to organize knowledge, we have developed several strategies that easily allow us to index and retrieve information. For example, one such method is to meticulously attribute the thoughts and ideas to the individuals who have developed them. Another strategy is to group all the ideas into common thematic areas and create associations between them. The physical manifestations of these methods can be found in any modern Western library or bookstore, as one peruses the aisles, sorting mentally through last names alphabetically or finding oneself in a physical space reserved for a certain type of material. Undoubtedly, such methods for organization have proven to be wildly successful, and continue to be so, as the majority of seekers of information eventually will find what they are looking for. A primary reason to account for this success is the fact that subject analysis is easily undertaken. Subject analysis successfully manoeuvres between the delicate balance between what the author perceives the idea to be and what the audience believes it to be, allowing for effective classification and cataloguing of information.

Traditionally, librarians perform subject analysis from a neutral perspective, allowing the ideas to speak on the behalf of their authors. However, at the same time librarians may have a greater understanding of where an idea fits in relation to similar ideas and its place within a greater intellectual context, going beyond the proposed limited scope of the author. Therefore, subject analysis functions as a filter between the raw ideas of the author and the general audience, supplying nuanced and contextual information which correlates to the broader intellectual environment. With the rise of content on the Internet since 1990, Western society has witnessed a data explosion of great magnitude, far greater than any librarian could organize or meaningfully analyse. Instead, Western society now relies on either automated machine processes or large crowds of lay individuals to evaluate and organize the vast quantity of information. While both of these processes evolved out of necessity, as a means to address the dramatically increasing amounts of information available, both automated processes or analysis by lay individuals contain significant flaws. It is interesting to note, however, that the recent development of the Open Graph Protocol (OGP) signifies a significant leap in how society could potentially approach subject analysis of large data sets such as that found on the World Wide Web (WWW). It is the aim of this paper to explore this emerging subject analysis technique and its possible impacts on the information profession.

Theoretical Context

To fully understand and appreciate the potential of the widespread adoption of the OGP, it is pertinent to first anchor the problems inherent with subject analysis in its historical context, including the debate surrounding the utility and function of subject analysis. Central to this debate is the fact that one cannot expect anyone to perform subject analysis from an objective standpoint, as the bias of an individual towards a subject may overrule the original author’s intentions. In grappling with these issues, W. J. Hutchins (1975) identified an inherent problem with the linguistic use of the process itself, judging the word 'subject' to be too confining to adequately give credit to the original ideas of the author (p. 115). Instead, Hutchins suggested that the term 'aboutness' be used to allow the indexer a differentiated approach to the complex task of analysis. Hutchins drew the word ‘aboutness’ from R.A. Fairthorne (1969) who used the terms 'intentional aboutness' and 'extentional aboutness'. 'Intentional aboutness' includes the holistic notions of the total document and its purpose, while 'exentional aboutness' constitutes the individual elements of a document such as paragraphs, heading, and general syntactic style (as cited in Hutchins, 1977, p. 24). Hutchins therefore drew attention to the notion that the process of subject analysis is highly multi-faceted and the indexing process is often influenced by the subjective perception of the work in question.

In 1992, B. Hjørland reached the unambiguous conclusion that even the term 'aboutness' could not be viewed as neutral as:

Neither the author's, the reader's, librarian's information specialists, not any other person's (for example the publisher's) point of view or subjective can have any certain objective knowledge about the subject of a document, nor defined the concept of 'subject.' (p. 174)

Following this line of reasoning, it becomes possible to argue that no process of subject analysis could ever be considered objective or unbiased. Furthermore, both 'intentional' and 'extentional aboutness' must be considered in performing subject analysis. Paired with the history of subject analysis on the Web, this argumentation ultimately explains why OGP presents a unique solution in dealing with the problems relating to subject analysis.

Subject analysis on the World Wide Web

In designing ways for people to search effectively on the web, search engines initially relied on the 'extensional aboutness' of a web document to perform subject analysis. Through semantic analysis of the content, a computer algorithm could use keyword matching to determine the best approximation of a document’s subject, however, this could not judge the value of the information as a whole. To solve this problem, professional indexers built category lists of websites that they judged useful to a person in search of specific information. In essence, they just built upon the traditional work of a librarian and incorporated it to include web documents. However, given the overwhelming number of webpages, this work is neither inclusive nor representative of the content available. As a solution, initiatives such as the Dublin Core Metadata Initiative, suggested the use of metadata element set to put greater emphasis on the 'intentional aboutness' of documents. As suggested by Weibel, Kunze, Logoze, and Wolf:

Finding relevant information on the World Wide Web has become increasingly problematic due to the explosive growth of networked resources. Current Web indexing evolved rapidly to fill the demand for resource discovery tools, but that indexing, while useful, is a poor substitute for richer varieties of resource description. (1998, Introduction)

Therefore, more emphasis was placed on what an author wanted a document to be about, giving search engines a better idea of what they should be looking for in evaluating the usefulness of a document.

Unfortunately, such a system bears inherent flaws as was so aptly revealed in the astounding success of the search engine Google, as illustrated by S. Brin and L. Page in 1998. In an introduction to their 'PageRank' algorithm they state:

Another big difference between the web and traditional well controlled collections is that there is virtually no control over what people can put on the web…it is interesting to note that metadata efforts have largely failed with web search engines, because any text on the page which is not directly represented to the user is abused to manipulate search engines. (1998, p. 6) Brin and Page sought to eliminate this problem by allowing individuals, other than the original author or an employee of a search engine company, to judge the 'aboutness' of a document. To achieve this, they used the aforementioned “PageRank” algorithm to judge a web document’s 'aboutness' through the amounts of anchor links or citations that linked to a site from another site and the keywords used to describe the link to the site. According to Brin and Page, the “anchors often provide more accurate descriptions of web pages than the pages themselves” (p. 5), implying that 'extentional aboutness', sourced on a massive scale, delivers more accurate results than intentional. The astounding success of Google certainly proves them right, however, such systems are still flawed. Given that a machine algorithm processes the citations, it proves extremely difficult to perform a value judgement regarding the 'aboutness' and only assumes that popularity implies accuracy. Even 13 years after its founding, Google still struggles with these issues as revealed in a recent newspaper story illustrating that Google cannot perform “sentiment analysis” that differentiates praise from denunciation or accurate 'aboutness' from inaccurate 'aboutness' (Segal, 2010).

A natural evolution of this mechanism represents the shift from having machines process the validity of 'aboutness' of a document to using large crowds of people perform the task through social tagging. With the increased availability of online access and the inherent human need to organize personal information and resources, the phenomenon of “crowdsourcing” took shape. “Crowdsourcing,” as defined by its originator Jeff Howe, is “the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call” (2010, Crowdsourcing: A Definition). In this context, it implies that instead of using the subjects defined through citations in other webpages, it would allow anyone with online access to assign a document tags or keywords that represent the 'aboutness' of that document. Because a large number of people perform this type of analysis, generally accurate descriptions rise to the top and the inaccurate are discounted. However, there are weaknesses to this system as there is possibilities for inaccurate subjects to start being miscategorized, as people agree with a tag or keyword before fully understanding the content of the document. The realm of mass psychology often plays a significant part in how an individual perceives a document, especially if certain subjects are pre–suggested, and could lead to biased subject analysis. Furthermore, it discounts any 'intentional aboutness' as the author of a document can only suggest 'aboutness' through the content, but not clearly define it. It only appears natural that a hybrid of the two solves this dilemma effectively and this is where the OGP presents a unique solution.

Open Graph Protocol

The currently largest social network site on the web, Facebook, created the OGP as a 'Resource Description Framework – in – attributes' type schema that draws its inspiration from Dublin Core, link-rel canonical, and Microformats. It allows its users to place metadata information into web documents that allow for a standardized categorization and classification of its contents (The Open Graph Protocol, 2010, Footer).

In this aspect, it closely resembles the structure and function of any other RDFa meta schema like Dublin Core. However, where it solves the problem of feeding false information into the metadata is through the use of a 'Like' button that a viewer of the document can click to link it back to their profile so as to share it with the contacts in their social network. It therefore almost functions like a citation as used by Google to determine the popularity and relevancy of a document. However, the content of the citation is determined by the metadata specified by the author of the document. This metadata is then used to display the information about the document of a Facebook user's profile page. This means that if an author of a document decides to add inaccurate metadata into their document, either for nefarious purposes or out of negligence, the user who initially approved the document by clicking the 'Like' button can, in turn, remove it from their profile and essentially disapprove with the 'intentional aboutness' specified by the author. Because the metadata is displayed as part of a Facebook user's social profile, users carry a vested interest in ensuring that the documents they publicly approve of and find worthy of sharing, are accurately represented.

By leveraging the social component, OGP allows for a strong sentiment analysis of content and the 'intentional aboutness' is retained, as it is specified through the metadata. Through the viral nature of the Facebook system and the immediate sharing of document links to all members of an individual’s social network, the assumption stands that documents of high popularity, with high relevancy and accurate metadata set, will outrank those that fall behind in either of these categories. A critic could possibly suggest that the verb 'Like' implies a certain amount of emotional positivity towards a subject and that this leads to a problem whereby documents contain content with emotionally negative information. Examples of such instances include news stories of tragedies or socially difficult subjects such as violence and death. In such an instance the verb 'Like' can be replaced with the verb 'Recommend', which implies a neutral emotional state, but retains the implicit understanding of the content itself to be accurate.

Currently the OGP allows for 18 properties, which include such varied information as type of document, image representatives, author contact information, and geo-location of the subject of the document measured in longitude and latitude (Open Graph Protocol Schema, 2010). The combination of metadata allowed through the specification of any of these properties allows for a significant enhancement of the definition of 'intentional aboutness'. Furthermore, because Facebook provides the technological mechanisms for plotting complex metadata such as location information in visually easy-to-understand and user-friendly representations, it allows for authors of web documents to give their work a much richer 'intentional aboutness'. A cursory survey of major news outlets in Canada and the United States, revealed that 80% of sites use a Facebook implementation of the 'Like' or 'Recommend' functionality.1 Not all sites made use of the full set of available OGP properties but most specified a title, an image, and most importantly, an object type. Through these three properties alone, it becomes easy for an OGP consumer, like Facebook, to make a preliminary assessment of the 'aboutness' of a document. These numbers also clearly indicates that the adoption of the OGP standard continues to rapidly increase despite its relatively short existence. Furthermore, it reveals that this new technology has surpassed its early adopters and is being adopted in the mainstream. Therefore, in terms of the organization and sharing of information, the OGP represents a fundamental advancement in how subject analysis is performed on web-based documents. Conclusion

The amount of data generated by the increasing number of documents on the web presents a unique challenge to the tasks of information organization and subject analysis. It appears, that more than ever, such tasks must adopt a collaborative nature that extends beyond those trained as information professionals. New information management schemas, such as Crowdsourcing and social tagging, allow for large-scale organization and classification with relatively high accuracy, however, these systems remove some of the ownership over the 'aboutness' of a web document from the original author. Furthermore, automated algorithms lack the necessary refinement to perform sentiment analysis. The Open Graph Protocol presents an elegant solution that combines the best of the available options as it relies on the 'intentional aboutness' of the document, as defined by the author, but leverages the crowdsourcing aspects of social networks. Most impressive, however, seems the fact that the Open Graph Protocol evolved out of the trials and successes of multiple approaches which promises even more innovative ways to perform web-based subject analysis to come.

References

Brin, S. & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Paper presented at the Seventh International World-Wide Web Conference (WWW 1998). Retrieved from http://ilpubs.stanford.edu:8090/361/1/1998-8.pdf

Hjørland, B. (1992). The concept of 'subject' in information science. Journal of Documentation, 48(2), 172-200.

Howe, J. (2010). Crowdsourcing. Retrieved from http://crowdsourcing.typepad.com/

Hutchins, W. J. (1975). Languages of indexing and classification; a linguistic study of structures and functions. London: Peter Peregrinus.

Hutchins, W. J. (1977). On the problem of 'aboutness' in document analysis. Journal of Informatics, 1(1), 17-35.

Open Graph Protocol Schema. (2010). Retrieved form http://opengraphprotocol.org/schema/?format=rdf

Segal, D. (2010, November 28). For DecorMyEyes, bad publicity is a good thing. The New York Times. Retrieved from http://www.nytimes.com/2010/11/28/business/28borker.html

The Open Graph Protocol. (2010). Retrieved from http://opengraphprotocol.org/

Weibel, S., Kunze, J., Logoze, C. & Wolf, M. (1998). RFC2413: Dublin Core metadata for resource discovery. The Internet Society. Retrieved from http://www.ietf.org/rfc/rfc2413.txt

Attachments:
FileDescriptionFile size
Download this file (Open Graph Protocol.pdf)Open Graph Protocol 98 Kb
Write comment
Your Contact Details:
Gravatar enabled
Comment:
Security
Please input the anti-spam code that you can read in the image.