This article examines the linguistic structure of folksonomy tags collected over a thirty-day period from the daily tag logs of, Furl, and Technorati. The tags were evaluated against the National Information Standards Organization (NISO)
    The structure and form of folksonomy tags: the road to the public library catalogue Louise F. Spiteri School of Information Management. Dalhousie University. Halifax, Nova Scotia. Canada.  Abstract Folksonomies have the potential to add much value to public library catalogues by enabling clients to: store, maintain, and organize items of interest in the catalogue using their own tags. The purpose of this paper is to examine how the tags that constitute folksonomies are structured. Tags were acquired over a thirty-day period from the daily tag logs of three  folksonomy sites,, Furl, and Technorati. The tags were evaluated against section 6 (choice and form of terms) of the National Information Standards Organization (NISO) guidelines for the construction of controlled vocabularies. This evaluation revealed that the  folksonomy tags correspond closely to the NISO guidelines that pertain to the types of concepts expressed by the tags, the predominance of single tags, the predominance of nouns, and the use of recognized spelling. Potential problem areas in the structure of the tags pertain to the inconsistent use of the singular and plural form of count nouns, and the incidence of ambiguous tags in the form of homographs and unqualified abbreviations or acronyms. Should library catalogues decide to incorporate folksonomies, they could provide clear guidelines to address these noted weaknesses, as well as links to external dictionaries and references sources such as Wikipedia to help clients disambiguate homographs and to determine if the full or abbreviated forms of tags would be preferable. Keywords:  Collaborative tagging, Controlled vocabularies,   Folksonomies, Guidelines. Resumen  Las folksnomías tienen el potencial de proporcionar valor añadido a los catálogos de las bibliotecas públicas permitiendo a los clientes almacenar, mantener y organizar ítems de interés en el catálogo utilizando sus propias etiquetas. El propósito de esta comunicación es examinar de qué modo las etiquetas que constituyen folksonomías están estructuradas. Las etiquetas han sido recogidas durante un período de treinta días a partir de tres sitios de  folksonomías:, Furl y Technorati. Las etiquetas fueron evaluadas siguiendo la sección 6 (elección y forma de los términos) de las directrices para la construcción de vocabularios controlados de la National Information Standards Organization (NISO). La  L.   F.   S PITERI  460 evaluación reveló que las etiquetas usadas en las folksonomías se adaptan a las directrices de la NISO que abogan por el predominio de términos simples, de sustantivos y por el uso de una grafía reconocible. Los aspectos potencialmente problemáticos en la estructura de las etiquetas son el uso inconsistente del singular y plural de los nombres contables, la incidencia de etiquetas ambiguas en el caso de los conceptos homógrafos, y problemas de identificación de abreviaturas o acrónimos. En el caso de que los catálogos de las bibliotecas decidan incorporar folksonomías, han de proporcionar directrices claras para evitar las debilidades reseñadas, al igual que enlaces a diccionarios y obras de referencia como Wikipedia que permitan a los usuarios desambiguar homógrafos y facilitar la elección entre las formas completas o abreviadas de las etiquetas. Palabras clave: Directrices, Etiquetado colaborativo, Folksonomías, Vocabularios controlados. 1 Introduction Digital document repositories such as library catalogues normally index the subject of their contents via keywords or subject headings. Traditionally, such indexing is performed either by an authority, such as a librarian or a professional indexer, or else is derived from the authors of the documents; in contrast, collaborative tagging, or folksonomies, allows anyone to freely attach keywords or tags to content. Demspey (2003) and Ketchell (2000) recommend that clients be allowed to annotate resources of interest and to share these annotations with other clients with similar interests. Folksonomies can thus make significant contributions to public library catalogues by enabling clients to organize personal information spaces, namely to create and organize their own personal information space in the catalogue. Clients find items of interest (items in the library catalogue, citations from external databases, external web pages, etc.) and store, maintain, and organize them in the catalogue using their own tags. In order to understand more fully these applications, it is important to examine how folksonomies are structured and used, and the extent to which they reflect user needs not found in existing lists of subject headings. The purpose of this study is to evaluate the structure and form of folksonomies against section 6 of the NISO guidelines for the construction of controlled vocabularies (NISO, 2005), which looks specifically at the choice and form of terms. 2 Definitions of Folksonomies Folksonomies have been described as “user created metadata ... grassroots community classification of digital assets” (Mathes, 2004). Wikipedia (2006) describes a folksonomy as “an Internet-based information retrieval methodology consisting of collaboratively generated, open-ended labels that categorize content such as Web pages, online photographs, and Web links.” The concept of collaboration is attributed commonly to folksonomies. Thomas Vander Wal, who coined the term  folksonomy , argues that tagging is done in a social environment (shared and open to others). The act of tagging is done by the person consuming the information (Vander Wal.Net, 2005). It may be more accurate, therefore, to say that folksonomies are created in an environment where, although people may not actively collaborate in their creation and assignation of tags, they may certainly access and use tags assigned by others.  L A INTERDISCIPLINARIEDAD Y LA TRANSDISCIPLINARIEDAD EN LA ORGANIZACIÓN DEL CONOCIMIENTO CIENTÍFICO  461   3 Benefits of Folksonomies Quintarelli (2005) and Fichter (2006) suggest that folksonomies reflect the movement of people away from authoritative, hierarchical taxonomic schemes; the latter reflect an external viewpoint and order that may not necessarily reflect users’ ways of thinking. “ In a social distributed environment, sharing one's own tags makes for innovative ways to map meaning and let relationships naturally emerge”   (Quintarelli, 2005).   Vander Wal (2006) adds that “the value in this external tagging is derived from people using their own vocabulary and adding explicit meaning, which may come from inferred understanding of the information/object.”   An attractive feature of folksonomies is their inclusiveness; they reflect the vocabulary of the users, regardless of viewpoint, background, bias, and so forth. Folksonomies may thus be perceived to be a democratic system where everyone has the opportunity to contribute and share tags (Kroski, 2006). The development of folksonomies may reflect also the difficulty and expense of applying controlled taxonomies to the Web: Building, maintaining, and enforcing a sound controlled vocabulary is often simply too expensive in terms of development time and of the steep learning curve needed by the user of the system to learn the classification scheme (Fichter, 2006; Kroski, 2006; Quintarelli, 2005; Shirky, 2004). A further limitation of taxonomies is that they may become outdated easily: New concepts or products may emerge that are not yet included in the taxonomy; in comparison, folksonomies accommodate easily such new concepts (Fichter, 2006; Mitchell, 2005). Shirky (2004) points out that the advantage of folksonomies is not that they are better than controlled vocabularies, but that they are better than nothing. 4 Weaknesses of Folksonomies Folksonomies share the problems inherent to all uncontrolled vocabularies, such as ambiguity, polysemy, synonymy, and basic level variation (Fichter, 2006; Golder and Huberman, 2006; Guy and Tomkin, 2006; Mathes, 2004). The terms in a folksonomy may have inherent ambiguity as different users apply terms to documents in different ways. The polysemous tag  port   could refer to a sweet fortified wine, a porthole, a place for loading and unloading ships, the left-hand side of a ship or aircraft, or a channel endpoint in a communications system. Folksonomies do not include guidelines for use or scope notes. Folksonomies provide for no synonym control; the terms mac , macintosh , and apple , for example, are used to describe Apple Macintosh computers. Similarly, both singular and plural forms of terms appear (e.g., flower and flowers), thus creating a number of redundant headings. The problem with basic level variation is that related terms that describe an item vary along a continuum of specificity ranging from very general to very specific; so, for example, documents tagged  perl  and  javascript   may be too specific for some users, while a document tagged  programming  may be too general for others. Folksonomies provide no formal guidelines for the choice and form of tags, such as the use of compound headings, punctuation, word order, and so forth; for example, should one use the tag vegan cooking  or cooking, vegan ? Guy and Tomkin (2006) provide some general suggestions for tag selection best practices, such as the use of plural rather than singular forms, the use of underscore to  join terms in a multi-term concept (e.g., open_source), following conventions established by others, and adding synonyms. These suggestions are rather too vague to be of much use, however; for example, under what circumstances should singular forms be used (e.g., non-count nouns), and how should synonyms be linked?  L.   F.   S PITERI  462 The pitfalls of folksonomies have been well documented; what is missing is an in-depth analysis of the linguistic structure of tags against an established benchmark. While popular opinion suggests that folksonomies suffer from ambiguous and inconsistent structure, the actual extent of these problems is not yet clear; furthermore, analyses conducted so far have not established clear benchmarks of quality pertaining to good tag structure. Although there are no guidelines for the construction of tags, recognized guidelines do exist for the construction of terms that are used in taxonomies. Although these guidelines discuss the elucidation of inter-term relationships (hierarchical, associative, and equivalent), which does not apply to the flat space of folksonomies, they contain sections pertaining to the choice and formation of concept terms, which may, in fact, have relevance for the construction of tags. 5 Methodology Tags were chosen from three popular folksonomy sites: Delicious, Furl, and Technorati 1 . Delicious and Furl function as bookmarking sites, while Technorati enables people to search for, and organize, blogs. These sites were chosen because they provide daily logs of the most popular tags that have been assigned by their members on a given day. The daily tag logs from each of the sites were acquired over a thirty-day period. A list of unique tags for each site was compiled after the thirty-day period; unique  refers to the single instance of a tag. The analysis of the tag structure in the three lists was conducted by applying the NISO guidelines for thesaurus construction (NISO, 2005), which are the most current set of recognized guidelines for the construction of controlled vocabularies. While folksonomies are not controlled vocabularies, they are lists of terms used to describe content, which means that the NISO guidelines could work well as a benchmark against which to examine how folksonomy tags are structured, as well as the extent to which this structure reflects the widely-accepted norm for controlled vocabularies. 6 Findings Unless stated otherwise, the number of tags per folksonomy site is 76 for Delicious, 208 for Furl, and 229 for Technorati. 6.1 Homographs The NISO guidelines recommend that homographs - terms with identical spellings but different meanings - should   be avoided as far as possible in the selection of terms (NISO, 2005, p. 32). Homographs constitute 22% of Delicious tags, 12% of Furl tags, and 20% of Technorati tags. Unique entities constitute a significant proportion of the homographs in all three sites, with 71% in Delicious, 43% in Furl, and 55% in Technorati. The most frequently-occurring homographs across the three sites consist predominantly of computer-related products, such as Ajax and CSS. 6.2 Single word vs. multiword terms The NISO guidelines recommend that terms should represent a single concept expressed by a single term or multiword term, as needed (NISO, 2005, p. 35). Single term tags constitute 93% of Delicious tags, 76% of Furl tags, and 80% of Technorati tags. The preponderance of 1   L A INTERDISCIPLINARIEDAD Y LA TRANSDISCIPLINARIEDAD EN LA ORGANIZACIÓN DEL CONOCIMIENTO CIENTÍFICO  463 single tags in Delicious may reflect the fact that it does not allow for the use of spaces between the different elements of the same tag, e.g., open source . 6.3 Types of concepts NISO provides a list of seven types of concepts that may be represented by terms; while this list is not exhaustive, it represents the most frequently-occurring types of concept. Table 1 shows the percentage of tags that correspond to each of the seven types of concepts: Table 1. Concepts Represented by the Tags Delicious Furl Technorati Things 76% 82% 90% Materials 0% 0% 0.4% Activities 12% 10% 4% Events 0% 0% 0% Properties 8% 6% 4% Disciplines 4% 3% 1% Measures 0% 0% 0% Tags that represent things  are clearly predominant in the three sites, with activities and properties forming a distant second and third in importance. None of the tags represent events or measures, and only a fraction of the Technorati tags represent materials. None of the tags fell outside the scope of the seven types of concepts. 6.4 Unique Entities Unique entities may represent the names of people, places, organizations, products, and specific events (NISO 2005, p. 36). Unique entities constitute 22% of Delicious tags, 14% of Furl tags, and 49% of Technorati tags. There is no consistency in the percentage of unique entities: Technorati has nearly twice the percentage of tags than Delicious has, and nearly triple the percentage of tags than Furl has. Computer-related products constitute 100% of the unique entities in Delicious, 63% in Furl, and 38% in Technorati. The remainder of the unique entities in Furl and Technorati represent places, people, and corporate bodies. 6.5   Grammatical forms of terms Table 2 shows the distribution of the grammatical forms of tags: Table 2. Grammatical Form of Tags Delicious Furl Technorati Nouns 88% 71% 86% Verbal Nouns 5% 6% 4% Noun Phrases - Premodified 1% 15% 4% Noun Phrases-Postmodified 0% 2% 3% Adjectives 6% 6% 3% Adverbs 0% 0% 0% If all the types of nouns are combined, then 95% of Delicious tags, 94% of Furl tags, and 97% of Technorati tags constitute types of nouns. The grammatical structure of the tags in the three
