<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-16359864</id><updated>2011-04-21T16:03:49.539-04:00</updated><title type='text'>Metadata Intern</title><subtitle type='html'>One man stepping onto the metaphorical Yellow Brick road to becoming a metadata specialist in a digital library program.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>29</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-16359864.post-113379872058871615</id><published>2005-12-05T11:05:00.000-05:00</published><updated>2006-03-01T23:34:13.040-05:00</updated><title type='text'>Technical metadata can be fun too!</title><content type='html'>I'm in the LETRS lab for the last Monday morning and working on the last project I'll do for this internship. Rather bittersweet actually. I'll stil have one hour left to fill on Thursday morning to round out to an even 180, but this is my last real protracted period of work.&lt;br /&gt;&lt;br /&gt;This last assignment is to flesh out a processHistory XML template for the technical metadata that the DLP is using that my supervisor has started. I'm supposed to create the process history information for encoding the digibeta file to mpeg, which is the source for the end product of streaming video that will eventually be put up online for end users to view. It's a useful assignment and a natural connection to what I was working on Thursday and Friday last week. &lt;br /&gt;&lt;br /&gt;I was given the task of reviewing the MIX metadata scheme that they plan on using for still image technical metadata and write up my recommendations for implementing this scheme. Judging from the info they were previously collecting, I went with a light level of description approach. I included recommendations for image height, width, compressionLevel, targtetID, encoding, resolution information and that sort of thing. I don't think they want to have exhaustive Level 3 records for every image they create, especially since it seems that most of the images they amass at the DLP are scans of book pages that are naturally fairly simple images where the important info is the level of contrast between the light and dark and how crisp the image is. That's the argument I made, and I suppose it was the right once since my supervisor just told me that it looked great. &lt;br /&gt;&lt;br /&gt;This new assignment for the day should keep me busy until 5 at least. Like all metadata tasks it won't necessarily be the data entry that is time consuming but fixing validation errors.&lt;br /&gt;&lt;br /&gt;I'm looking forward to presenting on this wonderful adventure as an intern metadata librarian on Thursday night. Still haven't written up what I'm going to say, but I've got the general idea. I don't want to scare everybody with a screenful of angle brackets, so I might go low-tech and just present from note cards. Five minutes really isn't a very long time and If I use Powerpoint for that I'll have all of five slides, if that. &lt;br /&gt;&lt;br /&gt;That's it for now. Will post one last time on Thursday to wrap up this project.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-113379872058871615?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/113379872058871615/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=113379872058871615' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113379872058871615'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113379872058871615'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/12/technical-metadata-can-be-fun-too.html' title='Technical metadata can be fun too!'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-113354534993121620</id><published>2005-12-02T12:42:00.000-05:00</published><updated>2005-12-02T12:43:51.673-05:00</updated><title type='text'>Review #9: the DLESE model for digital library planning</title><content type='html'>In an article for D-Lib Magazine this month, Kim Kastens et. al. creates a model on which to frame initial planning for a digital library initiative. Written as a Q&amp;A, the article might be valuable to a digital librarian just beginning to embark on a new collection initiative or else reworking an existing project so that it may be more effective and efficient. &lt;br /&gt;&lt;br /&gt;There are two main themes to the questions in the article: the idea of knowing the audience and knowing the collection that’s being digitized. These two themes ought to define the need for which the project is being created to fulfill. While the DLESE (Digital Library for Earth System Education) initiative has many answers that other organizations might find they answer similarly, the most interesting and compelling aspect of this article is not the answers necessarily but the questions. These progress from the very general (what is the goal? who is your audience?) to the very specific (will there be sub-collections? what kind of metadata is necessary to fulfill the goal of the collection?). The primary purpose of these core 12 questions is intended to focus the initiative onto the collection itself and in providing the best level of access for the needs of the end-users. &lt;br /&gt;&lt;br /&gt;Still, one thing that some readers might  find surprising is the emphasis on evaluating resources for inclusion in the digital  collection. For some collections this might not be quite as important as it was for DLESE, but it ought to be at least one of the concerns addressed during the planning process. Oftentimes, knowing how to evaluate a resource will drive the metadata creation process as your evaluation system might mirror the end-user’s evaluation purpose (in some ways although not all). Also, evaluation of the resource is tied to cataloging workflows, as Kastens et. al. argue, Adding evaluation of the resources into the planning process for a digital initiative will undoubtedly recursively improve every other aspect, focusing the direction on the needs of the users in order to fulfill their information needs regarding the collection. &lt;br /&gt;&lt;br /&gt;In the final third of the article, Kastens et. al. shift focus from presenting a model for digital initiative planning and towards a discussion of existing and future challenges that still need to be addressed and solved in the digital library field. It might be troubling for some readers to realize that many of the challenges that have yet to be solved are the same problems that have been dealt with since the beginning of digital libraries. However, as recent as it has been since digital libraries began becoming developed it  is not a source for dismay that these issues, such as creating completely accessible interfaces and resources; mobilizing research communities to participate in digital library initiatives and, most importantly, balancing the needs of end-users for simplicity and the needs of library administrators for precise, rich information in metadata development. The DLESE organization does not have any answers to these challenges, however one such answer might be standardized metadata schemes and extensively collaborative environments to establish a digital library program that is an integral part of the campus on which it resides.&lt;br /&gt;&lt;br /&gt;See Kastens, Kim. (2005). "Questions and challenges arising in building the collection for a digital library for education." D-Lib Magazine, 11(11): Last accessed at http://www.dlib.org/dlib/november05/kastens/11kastens.html on December 2, 2005.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-113354534993121620?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/113354534993121620/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=113354534993121620' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113354534993121620'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113354534993121620'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/12/review-9-dlese-model-for-digital.html' title='Review #9: the DLESE model for digital library planning'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-113319687371378746</id><published>2005-11-28T11:54:00.000-05:00</published><updated>2005-11-28T11:54:33.736-05:00</updated><title type='text'>The end is in sight. Or, is it?</title><content type='html'>As I approach the end of the semester and the end of this internship, I feel like my brain has been wrung out over a keyboard. I have stared at infinite angle brackets and decrypted countless needlessly abbreviated error messages, but because of all the aggravations I feel as if I have learned an enormous amount of information in a very short period of time. The LETRS lab has become my home for better or worse during this semester and I’ve become quite unafraid of asking infinite questions when I get stumped (this has happened a lot during the course of this internship). My brain has been wrung out, yes, but in the process of that wringing it has absorbed an awful lot of new knowledge.&lt;br /&gt;&lt;br /&gt;The last two projects I’ve done are proof of this. The Hohenberger stylesheet was a Sisyphean task of sorts, though in the sense that it was interminable not pointless. The United Nations/Nobel Peace Prize winners stylesheet was less so, although since this project is just barely at the lifting-off point for the DLP it’s been more an exercise in planning than anything else. &lt;br /&gt;&lt;br /&gt;For the Hohenberger stylesheet, I learned at least two valuable lessons (though I’m sure I learned at least three or four more that I’m just not aware of yet). I learned the power and utility of the XPath toolkit. The trick for this project was learning how to use recursive processing via parent-child axis and to grasp the concept of hierarchy. I last encountered this when I had a crash course in Java programming, but I think until now I had forgotten most of it and hierarchical inheritance is the crux of contemporary Web development. The other trick to successfully completing this project was of course judicious use of xsl:if and xsl:for-each, the latter of which took me some doing to fully grasp despite its prevalence in Perl programming. &lt;br /&gt;The primary problem of this project has been the sheer size of the file: it contains over 8,000 records, which translates to even more individual items. Since MODS, which is the scheme I’m mapping the original EAD records into, is designed to present individual records for complete items the final output is quite large. I never quite believed the ancient geek lament for more power until last week when I watched in dismay as my trusty Dell Optiplex workstation was brought to its metaphorical knees by the combined power of the demands of my stylesheet and the original EAD file. &lt;br /&gt;Related to the size issue is that it has been near impossible to examine the complete file carefully so I initially missed a whole slew of records that were buried in the middle of the file that were set off as sub sub groups of some record groups but not of others. This is the problem and the blessing of EAD as hierarchical levels of description is allowed for the archivist but makes the job of the developer transforming these records into some new standard much more difficult.  I also managed to fail to make an instance of subject elements that appear in a handful of records in the EAD file. Creating a transformation for these was itself an issue due to the nature of recursive processing, which continually goes up and down throughout the file and at first was retrieving every subject element in the entire file and placing them all in each MODS record output. This was solved by the wonderful little tool &amp;lt;xsl:if&amp;lt;. &lt;br /&gt;&lt;br /&gt;While recursive processing was not an issue for the United Nations/Nobel Peace Prize winners project, since the input metadata for this was in Excel spreadsheets which import nicely into XML as small, neat packages of whole and complete metadata per row. The problems with this project relate to the issue of data quality and subjective use of elements. The individuals who created these Excel files (35 workbooks and about 60 worksheets) made them more-or-less human-readable with seeming little thought towards machine-readability, and so before they could be made into XML files it required some fiddling and roundabout steps to tidy up the data by removing extraneous spaces and titles). The other issue has been that while most of the elements in the original data seem required there a few optional elements. Since I’m not completely certain what each of the elements is present for nor what each element explains about the items being described I had to create a series of &amp;lt;xsl:if&amp;gt; statements, &amp;lt;xsl:when&amp;gt; loops as well as many assumptions about documents versus images, titles versus abstracts, and what constitutes technical information. In all, though, this second project was a treat to do after the slog that was transforming the Hohenberger EAD records into MODS.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-113319687371378746?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/113319687371378746/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=113319687371378746' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113319687371378746'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113319687371378746'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/11/end-is-in-sight-or-is-it.html' title='The end is in sight. Or, is it?'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-113319335609576143</id><published>2005-11-28T10:55:00.000-05:00</published><updated>2005-11-28T10:55:56.230-05:00</updated><title type='text'>Review #8: Dennis Meissner on finding aid implementation for EAD</title><content type='html'>In his 1997 article for American Archivist, Dennis Meissner presents a compelling argument for caution regarding legacy data when converting to a new standard. He discusses the Minnesota Historical Society’s situation when they were faced with converting to EAD and how they took advantage of the situation to become even more customer-focused than before. &lt;br /&gt;Meissner explains how in the process of converting their finding-aids to EAD the MHS was faced with two problems. First, they discovered that many of the elements in their legacy finding-aids did not fit nicely into the EAD logical organization of those elements. He cites the situation with the identifier number, which the MHS originally treated as a string of numbers at the bottom of every finding-aid page. Secondly, they discovered that the actual structure of their finding-aids were woefully difficult to understand as they relied on archival expertise and jargon to create them. Rather than simply convert these problematic finding-aids into EAD and forget about them, Meissner and his colleagues took the opportunity presented them to &amp;quot;reengineer&amp;quot; their finding-aids, hence the name of the article as &amp;quot;First things first: reengineering finding-aids for implementation of EAD.&amp;quot;&lt;br /&gt;&lt;br /&gt;For Meissner, the primary problem with the traditional finding-aids is the bias inherent in serving users of the physical archives while EAD finding-aids are meant to serve remote users as well as those patrons who actually visit the archives. With the traditional finding-aid any problems of understanding that a user had could be remedied through user education and while this had been attempted to be transmitted to the Web there was very little evidence that this was actually successful. With the conversion to EAD, the MHS decided to make the finding-aids more transparent and readable so that a remote user could quickly retrieve the information he or she needed from the document on the computer screen. &lt;br /&gt;&lt;br /&gt;In order to do this, the MHS adopted a customer-centered approach to the creation of the finding-aid that prefigured the traditional archivist-centered approach.  With this as their vision, Meissner and his colleagues proceeded to utilize the structure of the EAD document to create an HTML page from the EAD records that would allow the remote user as well as the physically present patron to quickly read and interpret the finding-aids. Foremost, they presented the information from the general (name of the institution and logo) to the specific (item level descriptions for the collection being described in the finding-aid with the administrative information tightly packed into a single part of the document).&lt;br /&gt;&lt;br /&gt;Meissner presents an argument that EAD conversion provided the impetus for making  cleaner, more transparent finding-aids and he implies that without this conversion such an exercise (onerous as it was) would not have been pursued. This article presents a useful message to metadata specialists to always keep in mind that the ultimate goal of metadata is the output and that before creating new metadata records it is important to have a vision of what one wants to present to the user through that metadata and most importantly how that new output is going to aid the user in his or search for information.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-113319335609576143?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/113319335609576143/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=113319335609576143' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113319335609576143'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113319335609576143'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/11/review-8-dennis-meissner-on-finding.html' title='Review #8: Dennis Meissner on finding aid implementation for EAD'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-113318873973581053</id><published>2005-11-28T09:38:00.000-05:00</published><updated>2005-11-28T09:39:00.486-05:00</updated><title type='text'>Review #7: Janice Ruth on EAD development</title><content type='html'>As part of my archives and management class, I’ve been researching about Electronic Archival Description (EAD) on account it seems to be a very significant part of the metadata world. Back in 1997, shortly after EAD was officially released to the archival community, the America Archivist did a special two-part issue on this most complicated and most useful scheme. While most of the articles are boringly technical (detailed descriptions of what the &amp;lt;frontmatter&amp;gt; element versus the &amp;lt;eadheader&amp;gt; element is and is supposed to do), there are a couple of articles that are extremely interesting and bring up some compelling points about just why EAD ought to by all rights take the archival community by storm: here is where it becomes funny, for me, reading these arguments eight years after EAD has taken archives, shook them up, and sparked countless debates about the actual usefulness of this not-so-humble metadata scheme.&lt;br /&gt; &lt;br /&gt;Janice Ruth hedges on the boringly technical as she discusses the finer points of the development of EAD and why the working group made the decisions it did. She begins by arguing the merits of SGML as an open-source &amp;quot;technique for defining and expressing the logical structure of documents.&amp;quot; Ruth then goes onto discuss the merits of some of the more significant decisions of the EAD working group and attempts to prove why the EAD DTD is ideally suited to archival use.&lt;br /&gt;In the beginning of her article, Ruth presents an overview of SGML. She explains that the working group created EAD to reflect the content and not the structure of the traditional finding-aids since local practices vary so much in how finding-aids actually look since SGML is better suited to establishing a logical context rather than a strict physical structure (the look and feel) of a document.. That the SGML DTD specifies where and when a particular element may appear in the EAD record is not physical structure but logical context for certain bits of information that belong in one element and not the other so that the machine can process it. Finally, she explains that a powerful value of SGML is the ability to create attributes that create further machine-readable granularity. As a broad summary of the technical merits of SGML, this 1.5 pages is invaluable justification for the EAD metadata scheme over, say, a legacy Access database or, worse still, traditional paper finding-aids.&lt;br /&gt;&lt;br /&gt;With this knowledge in hand, Ruth goes onto to justify the decisions that the EAD working group made. First, she justifies the decision to minimize the number of elements created, arguing that very early on it was decided not to create an element  for every strucutural decision that might have been made in various local practices and instead allow for local practice with the &amp;lt;odd&amp;gt; and &amp;ltadd&amp;gt; (Other Descriptive Data and Additional Descriptive Data respectively). Each has its own use: the &amp;lt;odd&amp;gt; element provides a space for local practice data while the &amp;lt;add&amp;gt; element allows an archivist to present further information as he or she sees fit, enclosing a series of &amp;lt;p&amp;gt; tags within this poncho style element. The other major decision made, according to Ruth’s article, was to use generic terms rather than more specific vocabulary that may be familiar in some institutions but not in others. &lt;br /&gt;&lt;br /&gt;When she gets into a discussion of hierarchy in EAD, Ruth gets to the heart and soul of the value of EAD for the archival community. This is probably why she spends the bulk of her essay discussing the finer points of this topic. The EAD DTD is divided into four major sections: the &amp;lt;eadheader&amp;gt;, &amp;lt;frontmatter&amp;gt;, &amp;lt;titlepage&amp;gt;, and the &amp;lt;archdesc&amp;gt; elements. Each element encloses information relevant to the overall collection of records, but for hierarchy the &amp;lt;archdesc&amp;gt; element is the most important. Using the &amp;lt;dsc&amp;gt; and &amp;lt;did&amp;gt; elements, according to Ruth, the EAD working group allowed for near infinite subordinate components--records series, record groups, sub groups, sub sub groups, and so on, to be presented in a single EAD file and therefore reflect the hierarchical and non-item-specific nature of an archival repository. In this way, the intellectual arrangement of the finding-aids are reflected in the EAD DTD but not the physical structure of the documents themselves, since stylesheets can be used to transform the EAD records into HTML documents that are human-readable and therefore the EAD serves as an intermediary like most metadata records in the digital era and not the final product presented to the user as metadata was used pre-digital. &lt;br /&gt;&lt;br /&gt;Ruth spends the rest of the article presenting a broad overview (with samples) of the EAD record itself. In this A few highlights is an example of the hierarchical work of the &amp;lt;dsc&amp;gt; element, which allows for varying levels of description from record series to group to sub group to item all of which is inherited from higher levels by the lower levels. She concludes then with a summary of the meritorious history of EAD development, which has been done since the beginning according to the needs of the archival community in order to benefit their users.&lt;br /&gt; &lt;br /&gt;The article is an interesting glimpse, for me, into the development of a major metadata scheme as well as a revealing look into SGML, of which I know very little. It is helpful to see why certain decisions were made (from an individual who was present during those decisions) and why they were made. It is also helpful to see that many times decisions are made not for short-term convenience, but rather for long-term quality and ease of use of the finished products. The EAD DTD was developed in order to facilitate intellectual access to all the archival collections and while this is an ambitious, long-term project it is a worthy goal to pursue and for that reason the DTD was developed with a mind towards making the highest quality finding-aids possible for use in the digital age. It has its problems, as many have pointed out, but EAD is an admirable if complicated metadata scheme.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-113318873973581053?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/113318873973581053/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=113318873973581053' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113318873973581053'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113318873973581053'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/11/review-7-janice-ruth-on-ead.html' title='Review #7: Janice Ruth on EAD development'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-113233930419135119</id><published>2005-11-18T13:41:00.000-05:00</published><updated>2005-11-18T13:41:44.210-05:00</updated><title type='text'>musing on what it's all about...</title><content type='html'>My supervisor posted a very interesting question to the metadata librarians listserv that generated a good deal of discussion for a relatively quiet listserv. The fact that this question (what is it that metadata does for the whole information industry and more importantly what a metadata specialist does for metadata and for the information industry) caused me to think. Metadata is a relatively new word for a  very old concept--information that structures still other information--but it's doing something that has never been done before, by necessity of course in the digital world of exploding formats and volume of information. I put some though to the question and this is what I came up with. &lt;br /&gt;&lt;br /&gt;Metadata is digital information. It is more than merely scattered bits of data, since if it were scattered it would be of little use to anybody or anything. It is more than a series of fields pulled from multiple tables in a relational database, however it is less than content. Rather, metadata is more like grammar and syntax: it is the structure for the digital sentence. This by itself might be simple enough, but digital sentences reflect our increasingly specialized era and no longer are mere periods, question marks and semi-colons effective for creating the kind of structure that different information communities need. Yes, there is still a place for the plain but reliable period or the awkward yet useful comma, and that is why we have Dublin Core, but there is need now for more specialized marks like preferred citation form for a manuscript collection or the instrument a particular sheet music or the location on the network or the place in the streaming video file that the resource is at. All of this, both traditional syntax and new, more specific, more granular, structural elements represent metadata and makes sense of all those digital sentences floating out there that without metadata would be meaningless, un-findable and not of use to anybody much less the individuals in the information communities that need the information.&lt;br /&gt;&lt;br /&gt;A metadata specialist, therefore, has at least two primary purposes in the organization in which he or she works. First, the most significant task is to know almost all (nobody can know every metadata scheme or digital sentence structure) and how to use them to produce effective and findable information. Knowing how to use them most effectively is not itself a separate purpose because without it knowing what the different digital sentence structures are is of little purpose at all. Yet being able to uncover the most effective uses of these structures is more than merely reading a set of documentation about a particular sentence structure. It is very much problem-solving: knowing what to use when and how to use it. Second, the metadata specialist is an ambassador for better digital sentence structure. These structures are complex, their meanings and reasoning obscured to most except those who have trained to decipher them, and often the other information professionals as well as the professionals who create the information in the first place need convincing in order to grasp the full scope and value of the digital sentence structure to themselves and to others. A metadata specialist therefore walks in two worlds: the world of code and angle brackets and machine-readability and the world of human emotions, insecurities and anxieties and being able to switch back and forth between those worlds is essential to being a good metadata diplomat.&lt;br /&gt;&lt;br /&gt;The problem that metadata specialists encounter is that most people are not this way or at least they don’t think of themselves this way. They are either &amp;quot;people persons&amp;quot; or they are &amp;quot;gearheads&amp;quot; and ne’er the two shall meet. This is the way we have learned to think about the world and the people on it. The metadata specialist is out there beating the bushes, trying to draw everybody into the new digital reality and admit the truth that this dichotomy is non-existent and that we are all both emotional and intuitive as well as computer literate and minded sometimes scares people. &lt;br /&gt;&lt;br /&gt;The trick over the next few years will be assuaging that fear and showing it as baseless anxiety. I’m still not certain how I will do that, however, just that this is the root of the resistance that metadata specialists encounter.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-113233930419135119?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/113233930419135119/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=113233930419135119' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113233930419135119'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113233930419135119'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/11/musing-on-what-its-all-about.html' title='musing on what it&apos;s all about...'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-113199355768608605</id><published>2005-11-14T13:39:00.000-05:00</published><updated>2005-11-14T13:39:17.746-05:00</updated><title type='text'>Baby steps towards knowing XSLT even better...</title><content type='html'>This morning I worked on an XSLT stylesheet to transform a slew of Excel spreadsheets into a collection of MODS records. This itself is no difficult task; transforming anything to MODS metadata is practically second nature at this point in the semester and, I think that transforming Excel spreadsheets into something as simple as MODS ought to be a couple hours on Tuesday afternoon job. However, in this case, there are a couple of complications that will slow things down a bit. &lt;br /&gt;&lt;br /&gt;The most important thing is the state of the data that has been recieved. Out of 35 individual spreadsheets for approximately that many individual Nobel Peace Prize winners (and a couple of random collection of photographs just to spice things up), there are 6 different description schemes with some interchangable elements, some unique elements in each one of the schemes. The other, much more problematic, obstacle against easy transformation is that while the spreadsheets are human-readable (with all sorts of nifty spacing, padding and big bold titles), all of this human-readability makes it impossible to be machine-readable. This adds an extra step to making it friendly; I have to go through all 35 Excel workbooks and break it all up into separate worksheets and eliminate all those extra spaces that mess with the XML export. &lt;br /&gt;&lt;br /&gt;I suppose this is the big lesson of this project. I ought to just expect messy data. The second lesson is that messy data is why things like &amp;lt;xsl:if&amp;gt; and &amp;lt;xsl:choose&amp;gt exist. Of course, there's a myriad other things to do with these wonderfully diverse tools (the hammer of the XSL language), but for what I'll be doing with them they exist to deal with messy original data.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-113199355768608605?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/113199355768608605/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=113199355768608605' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113199355768608605'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113199355768608605'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/11/baby-steps-towards-knowing-xslt-even.html' title='Baby steps towards knowing XSLT even better...'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-113163718154864467</id><published>2005-11-10T10:39:00.000-05:00</published><updated>2005-11-10T10:39:41.563-05:00</updated><title type='text'>METS: useful but no fun at all</title><content type='html'>Since my supervisor is away at a conference learning all sorts of interesting stuff this week, I'm my own boss with a list of projects to work on. One of these projects was to learn about METS metadata, which is a highly structured metadata scheme used not so much for descriptive but technical metadata to run the pageturner application that allows the user to read a digitized text online in much the same way (page by page) that he or she would do so with a print book. &lt;br /&gt;&lt;br /&gt;I've created the XML record that I was instructed to create in METS. It's...interesting. I can see how it's useful immediately. Anything that will tie individual URLs together to form a cohesive whole--like a book, for example, or a series, perhaps--is an extremely valuable tool for a digital library to have. However, my supervisor was right, it's no fun to create a METS document. It's a matter of entering URI references over and over again, establishing mapped tables of contents both for the human users of the document or book that METS is helping to put together into a single unified whole again instead of just a scattered collection of image files as the digital document is without METS. That is, the true power of METS is in its ability to create both a human-readable table of contents and a machine-readable table of contents simultaneously. METS is truly a wrapper metadata scheme meant to facilitate the use of various other metadata schemes within a tidy application that ties it all together for the user so that the general public never knows or sees the complications going on behind the scenes. Truly, METS is the wizard of the metadata scheme.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-113163718154864467?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/113163718154864467/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=113163718154864467' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113163718154864467'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113163718154864467'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/11/mets-useful-but-no-fun-at-all.html' title='METS: useful but no fun at all'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-113122966441385925</id><published>2005-11-05T17:27:00.000-05:00</published><updated>2007-04-12T09:16:35.393-04:00</updated><title type='text'>Review #6: Peter Morville on Findability</title><content type='html'>In a &lt;a href=”http://www.infotoday.com/online/nov05/morville.shtml”&gt;recent article for information Today&lt;/a&gt;, Peter Morville, who has always been a lightning rod for bleeding-edge information and library science theories and ideas makes a controversial argument for a new theory of these two joint areas of scholarship and industry.  He proposes that rather than merely focusing on the organization of knowledge so that users who advance upon it in the traditional or non-traditional venues that we as librarians have come to expect and build architectures around that we instead focus on making information findable through a variety of paths and directions that we as librarians can’t possibly predict or hope to index in total. &lt;br /&gt;&lt;br /&gt;Mr. Morville uses as evidence of the need for this new model of information and library science two examples. First, he cites the continuing use of popular search engines like Google that wreak havoc on our carefully constructed information architecture by ripping apart our web sites and caching the individual web pages that can then be retrieved by users through keyword relevancy searches. This can cause numerous difficulties for the users when they click on dynamically created links to pages that, once they are there, present little or no information about where the information is coming from as well as information that is only peripherally relevant to their search. It also wreaks havoc with the concept of authoritative information as users find pages that are part of sites developed by commercial interests or less savory individuals who are distributing incorrect or, worse, false information rather than retrieving pages on sites developed by highly authoritative, reputable and distinguished organizations that present carefully researched, proven and well-written information that is of high relevance and importance to the user’s actual information need. The popular search engines, while simultaneously making information easily retrievable by all people regardless of information retrieval skills and ability, also disrupts the traditional notions of information authority on which our concepts of information architecture, cataloging and library access are  built upon. Instead, the users of the Web are left with a hodge-podge of files containing bits and pieces of information that may or may not be helpful to them. &lt;br /&gt;&lt;br /&gt;According to Mr. Morville it is the duty of the information and library professionals to adapt our current activities to bring our historically proven valuable constructions and models of authority and information organization and architecture to the Web search realm by simultaneously optimizing our architecture for random entrances to pages deep within the site and not merely for users accessing the site through the front door or home page as well as optimizing our pages for keyword search relevance and link algorithms the likes of Google. &lt;br /&gt;&lt;br /&gt;In order to do this, Mr. Morville proposes a model based on three questions that any information architect needs to ask him or herself before beginning the development of a site. First, he or she must ask: Can the user find the website? Second, he or she must ask: Can the user navigate the site? And, finally, he or she must ask: Can the user find the content despite the site? While all three combine optimizing for search engines with traditional information architecture and organization that library and information science has been built upon, the third is the most significant for this new era of find-ability that Mr. Morville is proposing. He argues by seeking to develop a web site with informational content that is easily and intuitively available to all users who will be seeking out this information using a variety of keywords or terms that we can bring our notions of information authority, organization and access into the age of Web search engine.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-113122966441385925?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/113122966441385925/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=113122966441385925' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113122966441385925'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113122966441385925'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/11/review-6-peter-morville-on-findability.html' title='Review #6: Peter Morville on Findability'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-113121263204444744</id><published>2005-11-04T17:30:00.000-05:00</published><updated>2005-11-05T12:43:52.056-05:00</updated><title type='text'>First big project done...now onto METS</title><content type='html'>It’s been a long week for this internship. I started it, on Monday, completing revision of EAD 1.0 records into EAD 2002, then spending over 4 hours struggling with XSLT transformation. What stumped was some confusion over how the relationships and hierarchies work. In the world of programming, and in the world programming for the Web in particular, I’ve discovered it’s always about the hierarchy. The inventors of the Web were academics, after all. But, after much gnashing of teeth and a crucial bit of help from my supervisor/on-the-job professor I managed to solve the problem and so an hour before I leave for the weekend I have completed the project of the past two weeks.&lt;br /&gt;&lt;br /&gt;I’ve learned that XSLT is like any other programming language: the most important thing is to know what you want to happen and where, exactly, you want that to happen. I’ve taken to keeping scratch paper next to me as I program so that I can jot down file paths and element attributes I need to use in certain circumstances. Coding is truly learning how to rewire your brain to think like a computer: all those little assumptions we make about where to find the coffee creamer and how to pour it into the cup before we pour the coffee in needs to be strictly enumerated for the computer. Without each baby step the computer freezes, paralyzed with indecision and doubt and presents a cryptic error message that it is my job to decipher. I’ve gotten pretty good at that particular task. &lt;br /&gt;&lt;br /&gt;The process of creating a transformation style-sheet transformation is actually quite fun, and I can only imagine that transforming the MODS records into HTML output is even more fun. I predict that establishing rules for HTML output is less about the logical structure of transferring one element in a particular place in the record into a very different place in the new record and more about the design of a workable user-interface so that the client can quickly and easily understand the information presented to him or her at the click of a mouse. My only question after this project is about my failed attempt to use the XSL attribute displayLabel that was rudely rejected by the validator for the MODS output. I wonder how that might be used. It makes sense that it might be used in HTML output as a trick of the user-interface as a means of making the information clearer for the patron surfing the database. In all, however, the XSL transformation went very well and I look forward to presenting my work to the individual responsible for the project into which it’s going to be folded.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-113121263204444744?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/113121263204444744/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=113121263204444744' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113121263204444744'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113121263204444744'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/11/first-big-project-donenow-onto-mets.html' title='First big project done...now onto METS'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-113078099634393687</id><published>2005-10-31T12:49:00.000-05:00</published><updated>2005-10-31T12:49:56.403-05:00</updated><title type='text'>Review #5: Leslie Johnston on building a Fedora repositor for the UVa</title><content type='html'>I have finally mulled over the concept of Fedora repositories enough to actually seek out more information, after the whirlwind introuduction about a month ago by Thorn Staples. For this reason, I was pleased to see that the newest issue of D-Lib Magazine has an &lt;a href="http://www.dlib.org/dlib/october05/johnston/10johnston.html"&gt;article on the University of Virginia's Fedora project&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Leslie Johnston seeks to explain the workflows process that the University of Virginia digital library program undertook in order to achieve this effort. She begins with a description of the top priorities. These formats, since as she points out the collection was divided up into formats, were those that absolutely had to exist in the Fedora repository if it were to be deemed a success. These were: images, texts, and EAD finding aids, all of which existed solely in the digital realm. What's interesting is that while the need for a digital image was the primary impetus for the program (there was none prior to the Fedora collection), the priority on texts and finding aids was made not because of an overwhelming need being vocalized by users but rather because the library itself needed a place to put its legacy digital holdings as well as a place to put EAD finding aids transformed from 1.0 to 2002. The priorities seem to have been chosen based on a mix of overwheming user need and lirbary need.&lt;br /&gt;&lt;br /&gt;The next step, and this was a reiterative process throughout the entire development of the Fedora repository, was to illicit feedback and analysis about the collection itself that would go into the repository. Before they could do anything, they had to know exactly what was needed and what would go into the repository so that they could sculpt the repository around the collection that it would house. It's like building a new house just for you exactly to suit and fit your every whim and need rather than merely rehabbing an older house to the best of your ability. This thought appears over and over throughout this essay: the idea that the reason they used Fedora was so that they might create a digital repository system from the beginning to suit precisely the needs of their institution. &lt;br /&gt;&lt;br /&gt;The analysis process took the shape of deciding how each format type would be displayed to the user and what programs would be required to make these displays. Since Fedora consists of three elements (metadata, data stream, and behaviors) it is imperative that the desired behavior be understood before beginning the development process, otherwise a system might get created that does not at all fit the needs of the institution's users. As Ms. Johnston makes explicit in her conclusion, the only way to ensure that this does not happen is continual feedback eliciting from the users of the system (both other staff and professionals as well as clients). &lt;br /&gt;&lt;br /&gt;She describes the feedback about the prototype program (what she labels phase 1) as well as the first publically available version (what she labels phase 2). In both cases, the primary evaluators were librarians in the University of Virginia system (for phase 1) and faculty in the arts and humanities academic disciplines. The feedback during both phases caused the team to go back to the drawing board and/or seek out means to refine the system to better suit the needs of the users.&lt;br /&gt;&lt;br /&gt;However, I was most interested in the best practices guidelines that the UVa Fedora team set out for implementing TEI records into the repository. Ms. Johnston explains how they broke the TEI collection into three $quot;content models$quot; that they then used to determine the behaviors that would be elicited from the EAD datastreams. These models were: 1) GenText, 2) Book, 3) PageBook. Each presents the relationship between the marked up text and the scanned image of the page, which in the Fedora repository would be displayed side-by-side. The TEI records were percieved therefore as merely related to the page images and not  complete unto themselves. However, the page image itself is not complete without the TEI record. The two, the TEI record and the page image, complement one another and provide not necessarily higher functionality but best functionality for the user. &lt;br /&gt;&lt;br /&gt;In her conclusion, Ms. Johnston emphasizes the importance of discussion with the user base of the system for the success of the University of Virignia Fedora repository. She breaks the process into 4 parts: 2 phases of development broken apart by 2 phases of discussion with users.&lt;br /&gt;&lt;br /&gt;The Fedora repository, however, is no exception as an example of digital repository programs. Regardless of the architecture used, it is always wise to continually seek evaluation and analysis of the project at every step.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-113078099634393687?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/113078099634393687/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=113078099634393687' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113078099634393687'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113078099634393687'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/10/review-5-leslie-johnston-on-building.html' title='Review #5: Leslie Johnston on building a Fedora repositor for the UVa'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-113019053052782406</id><published>2005-10-24T16:48:00.000-05:00</published><updated>2005-10-24T16:48:50.533-05:00</updated><title type='text'>EAD coming out of my ears...</title><content type='html'>I'vge spent the entire day, from 9 this morning, combing through EAD 1.0 records and cutting and pasting replacement doctypes, elements and attribute types in order to make old records conform to the newest, brightest, shiniest of EAD standards. I'm not sure what I learned today except that metadata is even more painstaking an effort than I previously imagined. I can see, doing this sort of thing, why Larry Wall is seen as such a mighty hero of the ubergeeks who toil away at these kinds of tasks. Regular expressions do make this kind of task very, very easy rather than a two day affair of toil and sweat over a keyboard. &lt;br /&gt;&lt;br /&gt;But, enough complaining for now. &lt;br /&gt;&lt;br /&gt;To keep my intellect occupied while I complete the retrofitting of the old records I've been assigned the secondary task (secondary only in the sense that it is not as immediately urgent as the first) of creating an XSLT stylesheet that will transform still another set of EAD records into MODS. I'm discovering, however, that this task is much more complex than first thought. The EAD records are built upon a completely different set of assumptions and theories than the MODS scheme assumes. I can only assume it's do-able but it's a matter of transforming Golden Delicious apples into Red Delicious ones. Same basic thing, very different details.&lt;br /&gt;&lt;br /&gt;I can do it, though. Just call me Gregor Mendel of the library world!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-113019053052782406?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/113019053052782406/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=113019053052782406' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113019053052782406'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/113019053052782406'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/10/ead-coming-out-of-my-ears.html' title='EAD coming out of my ears...'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-112982730794512373</id><published>2005-10-20T11:55:00.000-05:00</published><updated>2005-10-20T11:55:07.996-05:00</updated><title type='text'>Review #4: Anatomy of Aggregate Collections</title><content type='html'>Since D-Lib seems to be the premier e-journal for digital-library- and especially metadata- related research, I have tried to develop a habit of checking their website regularly. For that reason, I came across an article by &lt;a href="http://www.oclc.org/research/staff/lavoie.htm"&gt;Brian Lavoie&lt;/a&gt;, &lt;br /&gt;&lt;a href="http://www.oclc.org/research/staff/connaway.htm"&gt;Lynn Silipigni Connaway&lt;/a&gt; and &lt;a href="http://orweblog.oclc.org/"&gt;Lorcan Dempsey&lt;/a&gt; in which they analyze the Google Print collection using a traditional collection development model. &lt;br /&gt;&lt;br /&gt;They answer four important questions. First they seek to determine its coverage of unique books. Second, they try to find out the distribution of languages represented. Thirdly, they estimate the percentage of books that is out of copyright and therefore freely availble. Finally the most interesting question of all: how does the Google Print collection compare when the same experiment is done using five different libraries that more accurately reflect the typical libraries in North America. With this study, an important model is developed in which to determine the most effective digitization program efforts.&lt;br /&gt;&lt;br /&gt;Firstly, the article seeks to determine the amount of coverage between the Google 5, as Lavoie, Conway and Dempsey term the 5 premier research university libraries that are contributing to the Google Print program. In doing so, they make several intriguing discoveries. First, they uncover that in these 5 libraries the duplication of collecting has been steadily decreasing over the past thirty years at a rate of approximately 1-2% every five years. Secondly, they discover that out of a total 32 million books cataloged in OCLC Worldcat only 10.5 million unique books are covered in the Google Print system by the 5 contributing libraries. That is, 33% of the system-wide collection is represented in the Google Print program. They do, however, make one caveat about this data: they have used the FRBR definition of expression and manifestation to determine these numbers whereby 2 different imprints of a single title are 2 different manifestations.  Also, because the duplication of materials collection between these 5 libraries is steadily decreasing the likelihood of current materials being uniquely represented in the 5 is greater than for older materials. &lt;br /&gt;&lt;br /&gt;Secondly, the authors seek to determine how many languages are represented in the Google 5's combined collection. They find that, just like the OCLC WorldCat system, just under fifty percent of this North American-centric collection is English language, while French, German, and Spanish language materials make up 25% of what's left. That is, while there is an English-language bias reflected naturally by libraries in an English-language speaking nation, there is still a significant number of languages represented.&lt;br /&gt;&lt;br /&gt;Thirdly, the authors seek to solve the question of just how many of the books in the combined Google 5 collection are in or out of copyright. They have determined that about 6.5% of the combined collection is out of copyright and the percentage of that that is uniquely held by any one library is approximately 70%. That is, a significant fraction of the unique materials in the Google 5 collection are in fact out of copyright and therefore immediately available and what's more uniquely represented in each individual collection.&lt;br /&gt;&lt;br /&gt;The most intriguing question, for me, is how the Google 5 collection compares to a similar study done for a different hypothetical combined collection. The authors use for this study a small liberal arts college, a large American public university, a large Canadian public university and a large metropolitan American public library. The authors hoped to determine what differences in coverage were represented by a more typical sample of libraries. Since over 40% of print books in the collection of 32 million represented by WorldCat are uniquely held by a single library the authors sought to discover how coverage might be increased by digitizing collections from other libraries and not merely the top 5 universities in the world. &lt;br /&gt;&lt;br /&gt;They discovered that the most effective digitization efforts would be to enlist all the OCLC system-wide libraries in providing digital texts from their unique collection in order to ensure the highest percentage of unique titles. There were 5.6 million unique titles in the new collection, which roughly equalled 74% of the total collection, while only 58% of the Google 5 collection is unique titles. That is, the most effective digitization effort would be to enlist a great proportion of the system-wide libraries in OCLC WorldCat in order to retrieve a higher percentage of unique titles. &lt;br /&gt;&lt;br /&gt;Thus, the best model for digital libraries to implement is one of collaboration between different types of collections. This collaboration would best serve the combined users of all the libraries involved by providing the maximum output of unique information available to all users.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-112982730794512373?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/112982730794512373/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=112982730794512373' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112982730794512373'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112982730794512373'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/10/review-4-anatomy-of-aggregate.html' title='Review #4: Anatomy of Aggregate Collections'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-112981986888474763</id><published>2005-10-20T09:50:00.000-05:00</published><updated>2005-10-23T10:26:01.733-05:00</updated><title type='text'>Review #3: Martha Yee on FRBRized OPA(FL)s into true OPACs</title><content type='html'>I've just read a fascinating article by Martha Yee in in the July of 2005 issue of &lt;i&gt;Information Technology and Libraries&lt;/i&gt;. In it, she makes a highly compelling argument that while currently we do not actuallly have online public access catalogs, instead opting for the only thing available (online public access finding lists) that confuse and repel many library users it would be possible to use FRBR to renovate the existing catalog and develop a true online public access catalog that could actuallly present the user with the answer to their most common search--a title-author combination. &lt;br /&gt;&lt;br /&gt;Ms. Yee begins her argument with a description of some of the most common problems that uers have with OPACs. She brings up Seymour Lubetzky's ca. 1960 advisements that a library catalog ought to do at least three things: 1) allow a user to search using a combination of author's name and title; 2) match the search terms against author authority records and simultaneously title authority records; 3) produce a list of expressions and manifestations of a given work by a given author to present the grim state of the current library OPACs that don't do any of these things. She admits that number 1 is possible, albeit in a slightly backwards and non-intutive means that is only obvious to information professionals trained to recognize it. However, she stands adamant throughout the article that it is a highly significant failing (and the reason, she argues, that many library users turn to Google and other popular search engines with their information needs). She coopts a term from Lubetzky and refers to these pseudo OPACs that present simply lists of items that have some connection to the author and title search terms that a user enters as not OPACs at all but rather Online Public Access Finding Lists, which the user is forced to decipher on his or her own out of the mishmash of data that libraries present about their items. These finding lists, Ms. Yee decries, are not acceptable in the modern era when users are inundated with information: they want what they are looking for to appear immediately before them, and if it doesn't  they leave.&lt;br /&gt;&lt;br /&gt;After painting this bleak portrait of library OPACs, Ms. Yee goes on to propose a solution. She argues that by creating a new FRBRized OPAC (which she argues doesn't exist yet despite VTLC and other companies' attempts to develop systems that have this capability built-in) that uses work-level, expression-level, manifestation-level and item-level identifiers already extant in the MARC21 records of which there are millions already in the library world to create work and manifestation and expression authority records that an OPAC would consult before returning a list of expressions and manifestations to the user seeking out, for example, &lt;pre&gt;&lt;i&gt;Hamlet&lt;/i&gt; by William Shakespeare&lt;/pre&gt;. The resulting search results page would appear as a header for the work by the author and underneath a list, with results numbers, of expressions and manifestations that a user might click through to see the actual items relevant to this search that the library has available. &lt;br /&gt;&lt;br /&gt;This radical shift in cataloging would require significant changes in workflows for the library profession, especially for catalogers. Firstly, it would require catalogers to begin to create much fuller and richer authority records, a practice that currently, she argues, isn't done because the OPACs don't allow the users to access these authority records and the argument goes why spend as much time and labor creating these authority records when the user's don't even get to utilize them.  Secondly, she argues, it will require and invigorate a professionalization of the cataloging field when it becomes of increasing importance that those creating the cataloging records know and understand the theory behind the practice of the MARC21 record and can implement that theory correctly rather than spilling careless errors throughout the catalog system. She brings up one such example of these errors of catalog records that contain the Author/Creator element in a 700 field in the MARC21 record, thereby making it impossible for a user to find those records using an author search. In order to implement her FRBRizing of the MARC21 catalogs, Ms. Yee argues that current MARC records will have to be combed through and set right so that the FRBR system will be able to discover the various identifers and sort the results accordingly for users. Without complete and accurate MARC records, the records are not machine-readable and therefore not discoverable by a FRBRized OPAC system. &lt;br /&gt;&lt;br /&gt;This argument is a much-needed one in both the cataloging and the digital library world. Without a solid foundation of a complete and accurate library catalog and a usable OPAC with which our users can find the records that satisfy their information needs most digital library ventures will fail. Ms. Yee makes a powerful argument in a time when libraries everywhere are rushing to create digital repositories of their collections that, first things first, let's set straight the means by which our users find the information we have before we go ahead and change the format of that information.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-112981986888474763?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/112981986888474763/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=112981986888474763' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112981986888474763'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112981986888474763'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/10/review-3-martha-yee-on-frbrized-opafls.html' title='Review #3: Martha Yee on FRBRized OPA(FL)s into true OPACs'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-112958380608113205</id><published>2005-10-17T16:16:00.000-05:00</published><updated>2005-10-17T16:16:46.090-05:00</updated><title type='text'>And the mapping band marches on</title><content type='html'>So, as it turns out, my metadata professor was right: a large chunk of what metadata professionals do is figuring out how to 'map' one scheme into much more contemporary, much more universal, metadata schemes. My project for today, which has been to develop a crosswalk map between a Lilly library scheme for what seems to be photographs into MODS so that all this data (and there is a lot of data here) can be exported into the Fedora opan access initiatve of which Indiana University has recently become a member. This has meant, among other things, determining which fields need to be added, which fields need to be cut, and which fields ought to be simplified.&lt;br /&gt;&lt;br /&gt;I can't discuss this project in too much detail since I am currently in the midst of it. It will probably be the end of this week before I can say I've done the mapping completely. But, off the top of my head, I see at least two problems with the current scheme and its ability to integrate into MODS. Firstly, this being a scheme developed by archivists a bulk of the scheme (the first 80% to estimate) is devoted to pinpointing precisely who created the record itself, presumably to know where any inaccuracy in the record might have gone wrong. In MODS, however, there is very little space for this type of information and it just not satisfactory to toss 80% of the record into &amp;lt;recordInfo&amp;gt;&amp;lt;recordOrigin&amp;gt;blahblahblah&amp;lt;/recordOrigin&amp;gt;&amp;lt/recordInfo&amp;gt;&lt;br /&gt;and be done with it. Secondly, in MODS the assumption seems to be that one is dealing with a relatively self-contained entity that can be described completely by itself. However, archivists seem to work under the opposite assumption: no one thing can be described independently. Rather, every individual item in the archival repository relates very closely to the othe individual items on either side of it in some big box that itself relates very closely to the big boxes on either side of it. This concept of a record series is something MODS is lacking in functionality to handle, I think. I'm not sure what could be used to replace it. EAD is supposed to be universal; but, as is obvious to anyone who starts mucking about in the midst of it, it isn't really. It's meant to facilitate information/data transfer between a very specific subset of the information world: archivists. I suppose my real complaint in this rambling post is that if this any evidence there seems very little real collaboration between the different arenas of the information industry and this lack is only to the information professionals' peril as well as the alienation of their clients and customers.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-112958380608113205?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/112958380608113205/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=112958380608113205' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112958380608113205'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112958380608113205'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/10/and-mapping-band-marches-on.html' title='And the mapping band marches on'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-112922233637920616</id><published>2005-10-13T11:52:00.000-05:00</published><updated>2005-10-13T11:52:16.386-05:00</updated><title type='text'>XSLT, yay!</title><content type='html'>So, I've finally reached that pinnacle of metadata knowledge that I am working on eXstensible Style Language Toolkit (XSLT for the real Web people out there) implementation. I am slowly beginning to get it. On the plus side, it doesn't seem to be nearly as complex as Perl/CGI, which is a good thing.&lt;br /&gt;&lt;br /&gt;On Monday I worked on creating an XSLT transformation for a very simple mapping to Dublin Core from a local (MS Access generated) scheme for sheet music at the Lilly library. It was an intriguing learning process. I had the opportunity to realize that, yes, every programming language (or even sorta, kinda programming, sorta, kinda Web stylesheet language like XSLT seems to be) uses the same concepts. The if/elsif tool is seemingly universally available. I honestly don't know what I'd do without it. It'ss the "the" of programming--always there, frequently misspelled. The only issue I encountered was trying to work publisherPlace and publisher into a single field using AACR syntax. In the local scheme these are two fields, while in DC there is no publisher place field, just publisher. However, after a quick call for help to my supervisor I was able to solve the problem quickly and easily. &lt;pre&gt;&amp;lt;xsl:choose&amp;gt;&lt;/pre&gt; is a powerful and fairly intuitive friend for the xslt developer. I wish Perl had this type of function rather than having to rely on long, complicated series of if/elsif statements integrated into while loops (which themselves bring many headaches).&lt;br /&gt;&lt;br /&gt;Unfortunately, today there seems to be networking problems in the lab. This means that I cannot log onto the computer to get access to the DLP server and see my work. it's not too much of a problem: I can catch up on reading, update this blog for the week, and generally begin working out some of the intellectual kinks of metadata. I only wish I could get on and look at the readings that my supervisor wants me to read in preparation for a meeting on Monday about creating a new program for the digital image folks that mirrors some (not all) of the functionality of the Varations2 digital music program. I can always get at it on Monday morning, though, or else discover how to log on from my laptop and look at it at home.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-112922233637920616?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/112922233637920616/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=112922233637920616' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112922233637920616'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112922233637920616'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/10/xslt-yay.html' title='XSLT, yay!'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-112882760839233272</id><published>2005-10-08T22:01:00.000-05:00</published><updated>2005-10-08T22:13:28.400-05:00</updated><title type='text'>News, news, news</title><content type='html'>I was excited to hear on Thursday that my idea about using RDF to modify TEI might just bear some merit. &lt;br /&gt;&lt;br /&gt;I spent my time in the lab meeting with my supervisor and discussing various topics--among them the project I'll be working on starting next week. I'll be stepping up to the XSLT plate, and I'm eager to start. If working with TEI has taught me anything besides appreciate the incredible complexity of TEI itself and the necessary elements for marking up the text itself of documents it's an appreciation for code and I feel very much ready to step to the next challenge. &lt;br /&gt;&lt;br /&gt;I also look forward to developing my idea into a full reccommendation and to really delving into RDF. Currently, I know only just below the surface of that wonderfully complex and elegant standard. I only hope that after I educate myself further about the subject that I can contribute a reccommendation to the digital library program here.  &lt;br /&gt;&lt;br /&gt;In all, the internship goes well. Sometimes, I have to force myself to spend time on my other classes and put the internship work and reading aside. Intellectually, the readings are stimulating and even TEI has (at first) its rewards. It's good to know that this subject that I've chosen as my future profession (for the next five years anyway) is actually still interesting to me now that I'm starting to get into the nitty gritty reality of it. &lt;br /&gt;&lt;br /&gt;That's it for now: watch for a third article review about FRBR. I'm engrossed last year's Cataloging &amp; Classification Quarterly special issue on the subject. I'm still not quite sure what to think of it. It all needs time to be mulled over.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-112882760839233272?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/112882760839233272/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=112882760839233272' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112882760839233272'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112882760839233272'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/10/news-news-news.html' title='News, news, news'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-112845696817187055</id><published>2005-10-04T15:16:00.000-05:00</published><updated>2005-10-05T15:09:23.836-05:00</updated><title type='text'>Internship update #($whatever)</title><content type='html'>I'm in the thick of things now; it's at the point where I can come in on Monday and am fairly self-motivated for what I do with my time. &lt;br /&gt;&lt;br /&gt;I spent the entire morning and most of the afternoon working on TEI markup of the Indiana Authors biographical information. As is, I got through maybe six pages. It is tremendously slow going since not only does each individual reference to a person or area or topic or date require markup but nesting is imperative; thus it's not acceptable to do &lt;pre&gt;&amp;lt;settlement&amp;gt;Chicago&amp;lt;/settlement&amp;gt;&lt;/pre&gt;; rather, you have to use &lt;pre&gt;&amp;lt;placeName&amp;gt;&amp;lt;settlement&amp;gt;Chicago&amp;lt;/settlement&amp;gt;&amp;lt;/placeName&amp;gt;&lt;/pre&gt;&lt;br /&gt;And that's not even adding the type attribute values, which are required in the majority of instances of many of the elements. I guess what I am trying to say is that there has to be a beter way. I understand that XML is hierarchical and therefore any scheme built on this model must use this basic concept. However, must it be so exacting? How can we expect every library to use this to mark up its texts. I don't even want to think about what it would take to mark up the entire Indiana library collection with this scheme. &lt;br /&gt;&lt;br /&gt;I propose an RDF-based scheme to serve this purpose. That way, the hierarchy is assumed and does not need to be explicitly stated in each instance in the form of tags. Also, using RDF instead of SGML/XML would make it much easier for the addition of localized classes/properties/elements/attributes into the mix rather than relying on TEI to make an enormous and complex set of elements that the majority of collections being marked up might not even call for. &lt;br /&gt;&lt;br /&gt;After the morning of coding, I took a break to do some reading on digital library workflows in anticipation of a meeting at 2pm with my supervisor and a programmer who's been working on the EVIADA project, a digital video repository. The meeting went well, and it looks like I'll be working on XSLT as my next project as soon as I formally renounce/disavow TEI. Kidding, of course. I look forward to that since, as my supervisor keeps remindng me, XSLT is a complicated and difficult but crucial part of the job. Nothing that's important is every easy, though, so I anticipate facing off and triumphing over this next challenge.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-112845696817187055?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/112845696817187055/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=112845696817187055' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112845696817187055'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112845696817187055'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/10/internship-update-whatever.html' title='Internship update #($whatever)'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-112836175942858366</id><published>2005-10-03T12:49:00.000-05:00</published><updated>2005-10-03T12:49:19.436-05:00</updated><title type='text'>Review: Roy Tennant at LITA Forum '05</title><content type='html'>Since Mr. Tennant is the guru of the digital library world and I happened to stumble across a &lt;a href="http://spaces.msn.com/members/tatenunley/Blog/cns!1pDfICYtU2CJSPBwSHHZTZVw!382.entry"&gt;transcript of his speech&lt;/a&gt;. In opposition to Mr. Gorman, Mr. Tennant is a voice of tremendous optimism for anything and everything tied up in the digital library world. his &lt;a href="http://www.cdlib.org/"&gt;California Digital Library&lt;/a&gt; for the University of California is a shining example of a digital library program with a variety of projects that benefit both the internal community of the university as well as the external community. &lt;br /&gt;&lt;br /&gt;Mr. Tennant has 2 main theses. Firstly, he argues that librarians have to change the way they do things. He argues that this change will have to hinge on the librarian domination of the Internet for the benefit of our users. Secondly, he argues that efficiency and streamlined information systems are the wave of the future and that if it's broken it's better to abandon it than spend precious time trying to fix it. He points squarely at library OPACs in reference to this argument. Each of these points relates cohesively to the other to make a compelling argument for change in the modern library world.&lt;br /&gt;&lt;br /&gt;For proof of the need to change, Mr. Tennant points to what he terms "general information resources" like Google and Amazon as trying to take over traditional library tasks but without bothering themselves unduly with the library code of behavior towards information. For these general information resources and for the users who prefer them to the library OPAC it's all about  finding something and whether that thing is the best information is irrelevant. The library's goal, according to Mr. Tennant, is to take the search out of the library's information systems and make it easy for the general user to find the best information. The solution he posits for this thorny goal is provide technological collaborative solutions between different agencies--other libraries, publishers, and the users themselves. &lt;br /&gt;&lt;br /&gt;Naturally, these technological solutions for collaboration, according to Mr. Tennant, will lead to greater efficiency in the library workflows and in the library digital environment. XML processing of ONIX files from the publishers, MODS/METS files from other libraries, and various local metadata schemes throughout the local system as well as from the remote users themselves will provide the efficiency needed to allow for full integration of the best information within the library and make the library a one-stop-shop for the general users--kind of the 7-11 for information.&lt;br /&gt;&lt;br /&gt;I know that this isn't going to be easy to do. Besides requiring massive planning and procedural decisionmaking, such an enormous undertaking will require an enormous amount of money (or creative spending at least) and lots of marketing, something librarians have never been very good at. But, it's nice to hear from somebody who's hopeful about the survival of the library in the 21st century and who embraces change rather than resists it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-112836175942858366?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/112836175942858366/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=112836175942858366' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112836175942858366'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112836175942858366'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/10/review-roy-tennant-at-lita-forum-05.html' title='Review: Roy Tennant at LITA Forum &apos;05'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-112829281673345526</id><published>2005-10-02T17:40:00.000-05:00</published><updated>2005-10-02T17:40:16.763-05:00</updated><title type='text'>Is it a a small stride for humynkind? Or is it a giant leap for librarians?</title><content type='html'>After browsing &lt;a href="http://www.lisnews.com"&gt;LISNews&lt;/a&gt; today, I found &lt;a href="http://www.insideindianabusiness.com/newsitem.asp?id=15312"&gt;this story&lt;/a&gt; about Purdue University's recent creation of a new endowed chair in Information Literacy. Firstly, I have to applaud the effort on the part of this university that has already made many strides in the effort of information literacy education for its students. The event as a whole is something to be celebrated as a moment of triumph for library-kind.&lt;br /&gt;&lt;br /&gt;However, I have two significant reservations about this announcement. &lt;br /&gt;&lt;br /&gt;First, reading the article I was struck by this passage:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;The person filling the W. Wayne Booker Endowed Chair in Information Literacy &lt;b&gt;will conduct research and launch additional initiatives&lt;/b&gt; to increase &lt;b&gt;students' ability to access, assess and integrate information&lt;/b&gt; and make good judgments about what information they choose to use, said Purdue Dean of Libraries James L. Mullins. The holder of the chair will be hired after a national search.&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Emphasis is of course mine. That is a very tall order for one position: the work involved sounds more suited to two or three positions. Unless the plan is to develop the chair into a directorship-type position with an IT person, a librarian educator, and a digital librarian each supervising their particular wing of a larger organization within the university. &lt;br /&gt;&lt;br /&gt;Second, what is the empahsis of this position? Will it be of primarily an IT nature? Or, will be primarily education related? This passage might answer some of those questions, but it is by no means clear:&lt;br /&gt;&lt;blockquote&gt;With this chair, Booker said he wants to provide students with skills to be lifelong learners. Booker, a 1956 economics graduate of Purdue who received an honorary doctorate of humane letters in 2000, said he wants to see critical thinking and communication skills increased in the United States and abroad.&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Booker, of course, is the gentleman presumably behind this great push for an endowed chair in information literacy. He seems to be arguing that the position will be a symbolic one (like many endowed chairs are) as well as a scholarly one devoted to discovering better ways to educate incoming college students about the basic principles of information and the use of it. However, when take in the context of the earlier statement about the chair's duties, it just makes the chair position bigger and less wieldy for one individual to maintain. Who will fulfill the requirements? An individual with extensive IT experience and knowledge, who knows about copyright and the ins-and-outs of electronic resources both in the deep Web and the public Web, and who has an extensive knowledge of traditional librarianship and bibliographic instruction techniques? Like I said: it's a tall order. I look forward to following this story and seeing who they finally select for the position.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-112829281673345526?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/112829281673345526/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=112829281673345526' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112829281673345526'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112829281673345526'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/10/is-it-a-small-stride-for-humynkind-or.html' title='Is it a a small stride for humynkind? Or is it a giant leap for librarians?'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-112801038662532724</id><published>2005-09-29T11:13:00.000-05:00</published><updated>2005-09-29T11:13:06.783-05:00</updated><title type='text'>TEI, MODS, XML, XSLT: oh where or where did  the real words go?</title><content type='html'>I'm nearing the halfway point in this internship, and I've come to the realization that I too am beginning to adopt the acronym jargone of my profession. I use acronyms that are deeply meaningful to me in casual conversation with friends who then stare at me uncomprehendingly, or if on the phone they are struck dumb. I am developing into a real, live metadata-(ist? er?).&lt;br /&gt;&lt;br /&gt;But, onto more serious topics.&lt;br /&gt;&lt;br /&gt;I have been given the task of encoding as much of the IN Authors project into &lt;a href="http://www.tei-c.org/P4X/REFTAG.html"&gt;TEI &lt;/a&gt;as I can stand. My supervisor advised me against putting too much time into the effort as, her words, it's just mind-numbing. I predict that I'll put in about 30 hours of work on this project and then move onto something cooler, sexier and generally more awesome--like XSLT development and thinking about digital library workflows management, et cetera. TEI is intriguing and in a way very fun to know, but the actual implementation is less about problem solving and intellectual exercises, rather it is more about just typing the keys correctly. Like building a webpage, where the fun is designing the CSS and the grueling labor is in coding the page itself and making certain everything is well-formed and well-liked by all variety of Web browsers. Still, I'm finding that actually working on this project is causing my brain to rewire as I look at code differently now: I am beginning to be able to rapidly differentiate bad XML from good XML. I am become a coder.&lt;br /&gt;&lt;br /&gt;In discussion with my supervisor this morning, she informed me that the bulk of her job is sitting in meetings during which she advises and takes advice on the subject of metadata implementation. It is essential that I know the ins-and-outs of sitting in front of the computer and actually creating metadata, but the most important part is actually knowing enough to find ways to make that metadata work the way that the users and/or the administration of your library wants it to work.&lt;br /&gt;&lt;br /&gt;I'm not sure I'm at that point yet, but I look forward to starting on that path during the rest of this semester.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-112801038662532724?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/112801038662532724/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=112801038662532724' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112801038662532724'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112801038662532724'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/09/tei-mods-xml-xslt-oh-where-or-where.html' title='TEI, MODS, XML, XSLT: oh where or where did  the real words go?'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-112761158348401031</id><published>2005-09-24T20:25:00.000-05:00</published><updated>2005-09-24T20:30:55.506-05:00</updated><title type='text'>Article review #2: Michael Gorman on authority control</title><content type='html'>During one of my rare Web browsing moments over the past couple of days, I found a reference to a &lt;a href="http://mg.csufresno.edu/papers/Authority_Control.pdf"&gt;presentation &lt;/a&gt;by &lt;a href="http://mg.csufresno.edu/"&gt;Michael Gorman&lt;/a&gt; for the &lt;a href="http://www.sba.unifi.it/ac/en/program.htm"&gt;2003 International Conference on Authority Control&lt;/a&gt;. Since he is the president of the ALA, an organization to which I do belong and the membership in which I take pride, I decided to read the speech.&lt;br /&gt;&lt;br /&gt;Michael Gorman is first and foremost a respected (and rightly-so) cataloger. His list of accomplishments in this field make me wish that when I reach that point in my own career I have a list at least half of what his is. His thesis is a defense of the power of the library catalog, which during the past few years has been under attack from all sides. This is an admirable defense and one that the president of the ALA ought to make. However, I was troubled by his outright dismissal of metadata: he voices an extremely contemptous view of this integral aspect of any contemporary Information System and one that is only growing in importance with each succeeding technological generation. But, let's not dwell on the negative, since there are primarily points on which everybody (regardless of where in the library world they stand) can agree.&lt;br /&gt;&lt;br /&gt;His discussion of the central importance of authority files is important. Authority files have in the past proven remarkably effective in solving the most basic conflict in libraries (letting users find exactly what they are looking for). Finally, based on the audience of this speech Mr. Gorman is perfectly correct and indeed admirable in his eloquent defense of this oft overlooked but integral part of the library catalog. Based on his description alone, it is obvious that the solution to the crisis of information organization on the Internet is to attempt to define some manner of authority control (or the best facsimile at least) for at least significant information out there. Mr. Gorman's defense of authority control as a centralized power structure for documents and subject headings as more than just free-form is well spoken and important for everyone in the library to heed.&lt;br /&gt;&lt;br /&gt;Mr. Gorman continues the discussion by dismissing metadata. To paraphrase, he calls for an outright abandonment of metadata efforts (I have to assume this is primarily meant in satire). He uses as proof for this call two examples. The first is the laughably simple Dublin Core, which does have many positive aspects despite its lack of granularity. The second is the failure of the Google search engine to differentiate between himself and at least two other Michael Gormans out in the world. Mr. Gorman attacks the lack of any content standards in the Dublin Core scheme as the primary reason for this assault on its character. It is true that DC has a significant failing in its lack of content standard, but it attempts to make up for this as best it can through the best-practice useage of many of its 16 unqualified elements. Whether it succeeds at this is up for debate, and many very intelligent people keep this discussion alive. It is appropriate to attack Dubln Core: attacks is just what DC needs in order to strengthen its armor and make it relevant for the contemporary and the future needs of content providers on the Web. However, when he attacks Google, I believe he does so not incorrectly but rather less than perfectly correct. Google's link indexing is an attempt (albeit a frequently faulty one) at authority control that utilizes the decentralized nature of the Web. To attack Google on the basis of its lack of organized search results and call that a lack of authority control is, I suspect (although I am in no way a true master of this topic), not perfectly correct. Google's link searching algorithm is a means towards authority control; it is why it is currently one of the most popular search engines available although it is fast becoming obsolete to newer, nimbler players out there (like &lt;a href="http://clusty.com/search?query=%22michael+gorman%22"&gt;Clusty  &lt;/a&gt;and  &lt;a href="http://www.kartoo.com/flash04.php3"&gt;KartOO &lt;/a&gt;). I suspect that the reason, in 2003, that Mr. Gorman found Google a poor search engine for his own name is that at the time there simply were not very many links to his own webpage. This is not a problem anymore. The newer players in the search engine industry utilize a much more sophisticated algorithm than Google, and one based on metadata I suspect, to better organize search results. Clusty, the better one I think, organizes these into predefined categories (much like subject headings) that display in the lefthand column of the user's window; KartOO displays a visual map of the information, showing the primary search term in the center and the many options to drill down in the search as planets in orbit around this search term sun. The issues that Mr. Gorman brings up are very signfiicant, and concerns that need to be voiced over and over again by authorities in librarianship like him in an effort to improve on those issues, not dismiss them entirely.&lt;br /&gt;&lt;br /&gt;Mr. Gorman is an esteemed and incredibly intelligent man, and I will not presume to pass judgment on his ideas. After all, his central thesis of defending authority control is indeed a watch fire for everyone involved in the effort to make online resources available to the public. He also has many, many years more experience in librarianship than I do, and if his list of publications isn't enough to convince a person that he knows what he is talking about his primary achievement as first editor of AACR and AACR2 should satisfy. However, I see a call for much more optimism in the future of metadata; after all, already we are making incredible strides with the likes of &lt;a href="http://www.openarchives.org/"&gt;OAI&lt;/a&gt; and &lt;a href="http://www.loc.gov/standards/mods/"&gt;MODS&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-112761158348401031?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/112761158348401031/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=112761158348401031' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112761158348401031'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112761158348401031'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/09/article-review-2-michael-gorman-on.html' title='Article review #2: Michael Gorman on authority control'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-112745648021011961</id><published>2005-09-23T03:20:00.000-05:00</published><updated>2006-07-24T11:29:21.910-04:00</updated><title type='text'>Quick thought...</title><content type='html'>After reading Mark Ludwig's &lt;a href="http://www.libraryjournal.com/article/CA266430.html"&gt;article&lt;/a&gt;, and &lt;a href="http://texadata.blogspot.com/"&gt;clicking &lt;/a&gt;&lt;a href="http://hangingtogether.org/"&gt;through &lt;/a&gt;&lt;a href="http://outgoing.typepad.com/outgoing/"&gt;various &lt;/a&gt;digital librarians' blogs, I am more convinced than ever that it is very much possible to open wide the doors of every library in the world (at least digitally) to every living person on the globe. What's stopping us besides, in the paraphrased words of a very great man, our own hesitation? There are caveats a-plenty, yes, but caveats are just friendly bits of advice not outright warnings against.  I find myself agreeing wholeheartedly with &lt;a href="http://texadata.blogspot.com/"&gt;Brian Surrat&lt;/a&gt; on &lt;a href="http://texadata.blogspot.com/2005/07/take-my-metadata-please.html"&gt;open access&lt;/a&gt; (and on an unrelated note I must send kudos for his choice of blogger template).  He makes what I think is an obvious prediction when he divines that in a few years every library record will be searchable on whatever search engine the consumer likes best.  I do wonder what happens to any controlled vocabulary in search engines, but then I see &lt;a href="http://www.scirus.com/srsapp/search?q=metadata&amp;ds=jnl&amp;amp;ds=nom&amp;ds=web&amp;amp;g=s&amp;amp;t=all"&gt;this&lt;/a&gt; and I don't really worry that much. I can only scratch the surface of what programming and metadata tricks are involved in presenting a refine list of controlled vocabulary options in a search results screen like that.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-112745648021011961?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/112745648021011961/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=112745648021011961' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112745648021011961'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112745648021011961'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/09/quick-thought.html' title='Quick thought...'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-112733390119226433</id><published>2005-09-19T21:30:00.000-05:00</published><updated>2005-09-21T15:18:21.193-05:00</updated><title type='text'>Um, er, uh, snuh...Fedora rocks?!?</title><content type='html'>If my nonsensical subject line isn't enough to infer my general feelings after the afternoon o' meetings, then perhaps I should attempt to be eloquent and elucidate my feelings. In the broadest sense, I am confused and desperately want to know more.&lt;br /&gt;&lt;br /&gt;Firstly, the digital library world is rife with acronyms. Why is this? I spent most of the time jotting down what I thought might be the proper spelling for acronyms that the rep tossed out casually as if everybody in the room ought to know what they were. What's worse, everybody in the room really did know them perfectly. Now I just feel bad and have a sick feeling of needing to play catch-up.&lt;br /&gt;&lt;br /&gt;The project looks incredible, though. It's also unbelievably complicated for whoever gets the lucky honor of actually implementing it. As the rep kept repeating, Fedora isn't really a suite of applications or an automation system of any kind. If your digital library website were a skyscraper, Fedora is the iron beams  that keep everything rising up so beautifully into the sky. Without it, your digital library will be just a cute little bungalow with a beautiful but isolating view of the sea or rolling green hills or whatever environment is around your library. You have to install the sinks, toilets, electricity and, um, floors, ceilings and walls (not to mention the carpet and paint and other aesthetic decisions) that makes everything look nice and attractive for the people who inhabit your skyscraper.&lt;br /&gt;&lt;br /&gt;From what I can gather, Fedora consists of three basic concepts that every object in the system must have: metadata, disseminators (the stuff that makes cool applications like page-turning and searching work) and PIDS (the unique indentifiers that tell the system that, yes, this object is special from the clump of code sitting next to it in the server and, of course, datastreams (which is a fancy way of saying the digital manifestation of the thing you're making available to the users. With these things, the Fedora system connects all the objects together and follows instructions programmed by the DLP staff to store and make available the resources your library has.&lt;br /&gt;&lt;br /&gt;The second meeting was even more complicated and technical. The two staff members at the IU DLP put in charge of implementing Fedora for the IU digital library asked a serious of questions that largely went over my head. I did come away with one cogent thought that was hammered into my head repeatedly during the afternoon: metadata (system and descriptive) is at the heart of everything that Fedora wants to accomplish in establishing the semantic web. Everything has to have metadata.&lt;br /&gt;&lt;br /&gt;That's it for now. I suppose I'll become more knowledgeable about Fedora as I come to use it at some point during this internship. Very exciting! Scary, but exciting still.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-112733390119226433?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/112733390119226433/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=112733390119226433' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112733390119226433'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112733390119226433'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/09/um-er-uh-snuhfedora-rocks.html' title='Um, er, uh, snuh...Fedora rocks?!?'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-112733135191385416</id><published>2005-09-19T13:59:00.000-05:00</published><updated>2005-09-21T15:09:18.710-05:00</updated><title type='text'>End of the MODS mapping in sight...</title><content type='html'>I spent the morning creating a sample XML document for the MODS mapped archives photograph scheme that I had created. I have to admit that actually implementing it caused me to reconsider some choices I had made.&lt;br /&gt;&lt;br /&gt;I also wonder how useful the records as they stand might be for the user searching for them from a home computer. What is the ultimate goal of this collection? The database as it stands is of primary use to those who work in the IU archives and really isn't meant to be used by the general public. The information it holds is meant to be proprietary to the archives.&lt;br /&gt;&lt;br /&gt;My recommendations are legion. Firstly, I would like to see the addition access points like title, creator. Also, is it possible to split the Photographer or Studio field into two separate fields? As is, the situation is needlessly confusing. Why restrict it to either a photographer or a studio? Might there be an instance when the photographer (for example) worked for a studio. More information to prove the authenticity of the photograph ought to be a good thing. Coincindentally, this solution would solve my problem over who is the creator. The photographer is the creator, while the studio is (the publisher perhaps?) In the case of privately created photographs the creator would be the only thing known and the publisher field would be left blank. Also, the placement of image subjects and keywords and the series, subseries, folder entry points, which are apparently direct references to the location of the photographs on the shelves of the archives, is redundant. In the digital world, the series, subseries and folder descriptions are in fact subject access points. The only issue is figuring out a way to make those locally defined subjects globally accessible. How do you transform preexisting subjects into a controlled vocabulary? I really hope there is some sort of automated software to do this, though I doubt it.&lt;br /&gt;&lt;br /&gt;This took over the morning which was actually enough. It gave me the chance to acclimate myself with the Oxygen XML editor that I will no doubt be using extensively in this internship. My supervisor had just returned from the &lt;a href="http://ismir2005.ismir.net/"&gt;ISMIR &lt;/a&gt;conference in London, so being stuck in the lab fiddling with XML editors left her with time to reintigrate into the swing of things at the DLP without worrying about giving me a project or telling me my voluminous mistakes.&lt;br /&gt;&lt;br /&gt;Kidding, of course!&lt;br /&gt;&lt;br /&gt;In the afternoon, I have the opportunity to sit in on a couple of meetings with a representative of the &lt;a href="http://www.lib.virginia.edu/digital/"&gt;University of Virginia&lt;/a&gt; &lt;a href="http://www.fedora.info/"&gt;Fedora&lt;/a&gt; project. From what I've read already of a whitepaper by Carl Lagoze on the subject it should be very exciting, and apparently I'll be on something of the same level as my supervisor on this particular learning endeavor.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-112733135191385416?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/112733135191385416/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=112733135191385416' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112733135191385416'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112733135191385416'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/09/end-of-mods-mapping-in-sight.html' title='End of the MODS mapping in sight...'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-112655936053443589</id><published>2005-09-12T18:08:00.000-05:00</published><updated>2005-09-12T16:09:20.910-05:00</updated><title type='text'>Article review #1: Carl Lagoze's Keeping Dublin Core Simple</title><content type='html'>I've read Carl Lagoze's still highly relevant article. It's a good one to keep in mind despite that I found myself wanting desperately to disagree with him in every other paragraph. Still, by the concluding paragraph I found that despite myself I was nodding along with his main point.&lt;br /&gt;&lt;br /&gt;Mr. Lagoze's central argument is that because the Web is such a vast, tangled and untamed metropolis that has been built like so many mining towns in the wild West it does not benefit anybody to introduce that much complexity (read: granularity) to widely used metadata schemes. Mr. Lagoze's primary reason for this argument is that too much complexity interferes with resource discovery. He concedes that sometimes specialized communities need a high degree of granularity in order to do its job, but it does not behoove us to introduce that specialized granularity into the wide world of the web; rather it is better to keep that granularity within the limited scope of the particular community that needs it.&lt;br /&gt;&lt;br /&gt;In order to make his thesis, Mr. Lagoze describes Dublin Core as pidgin metadata--metadata that strips out all the fancy grammar that a proficient user of any particular language (read: research community granularity) might use in favor of utilliterian functionality for non-native speakers (tourists taking cruise ships over the Web ocean). In declaring the benefits of this pidgin metadata, chief among them being that it is cheap and relatively easy to produce, Mr. Lagoze holds up the Dublin Core as a paragon of virtue that serves this purpose admirably and ought not to be sullied with fancifying grammar.&lt;br /&gt;&lt;br /&gt;I like his main point: that simple metadata that is usable by the majority of end-users is to be valued more than complex metadata that is usable only to a minority of end-users. As I have struggled through the past two metadata mapping projects, I have come to realize that the highest degree of granularity just isn't relevant to typical, day-to-day end-user resource discovery. The metadata librarian must reach a happy medium in the scheme that he creates. Her scheme is only as valuable as the aid it offers  all users to find the resources they want and need.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-112655936053443589?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/112655936053443589/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=112655936053443589' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112655936053443589'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112655936053443589'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/09/article-review-1-carl-lagozes-keeping.html' title='Article review #1: Carl Lagoze&apos;s Keeping Dublin Core Simple'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-112655444252839722</id><published>2005-09-12T16:47:00.000-05:00</published><updated>2005-09-15T10:01:41.576-05:00</updated><title type='text'>Why MODS definitely isn't one of my fav-o-rite things right now</title><content type='html'>I've been working on the project I first discussed last post: the mapping of the database scheme for the IU Archives photography collection into MODS all morning. It's a frustrating exercise requiring a very high tolerance for ambiguity.&lt;br /&gt;&lt;br /&gt;The first problem is, of course, with the IU Archives database. As one might expect from an archival institution, provenance is a very significant purpose for this database. One might even make the argument that clarifying the provenance and legal rights towards the material in the collection is the only purpose of this database and that providing ready access to the materials through the database is a secondary purpose at best.&lt;br /&gt;&lt;br /&gt;There are in fact 2 possible collection numbers: one that is assigned within the archival institution and the other assigned by the donor of the collection yet kept for some reason. Could this be a holdover from Jenkinson that put the original collector/administator of the collection upon a pedestal and put the larger purpose of the archivist on a lower pedestal?&lt;br /&gt;&lt;br /&gt;But, wait, there is more. There is also no easily recognizable primary access point. What serves as the access point in this collection is a series of drop down menus labelled Collection (the broadest possible categorization which is simply IU Archives), Series (a slightly more focused topical organization such as Buildings &amp;amp; Grounds, People, perhaps Faculty and Students, et cetera), Subseries (drilling down considerably now into a more focused description such as Andrew Wylie House and is likely by individual building, location, person, et cetera) and finally, where it is warranted, Folders (a way of organizing a large Subseries collection into much more focused and specific sub-subseries such as Back of Andrew Wylie house). I have mapped all of these into the subject top-level element of MODS and given instructions to keep it to topical, person, or temporal data elements. I hope that this is satisfactory. I can only assume that it will be since I have managed to maintain the provenance information of the collections.&lt;br /&gt;&lt;br /&gt;This has not been easy to do within the MODS scheme, since the purpose of MODS is to provide universal access points from MARC21 records and not necessarily maintain provenance information to a very precise degree. Provenance being a secondary (if at all conscious) need of the average user, this makes a lot of sense. The typical user of an I/S can be assumed to be looking for useful information to them not for information the pedigree of which can be tracked from beginning to present-day. For this reason, MODS has no direct rights element: just a lonely top-level data element called accessConditions that implies much more about privacy policies, use conditions and the like than copyright holders, deeds of gift and other such functionaries.&lt;br /&gt;&lt;br /&gt;This long description of my mapping woes has been building towards this succinct explanation: that putting the IU photograph archives into MODS has been a valuable learning experience as I twisted myself into knots to understand just what it is each of these schemes value the most and how to make the latter work for the former.&lt;br /&gt;&lt;br /&gt;I have done it though, and there goes another notch on my metadata belt: original metadata mapping. It's still too big, but the belt is starting to get kinda to the point of fitting.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-112655444252839722?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/112655444252839722/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=112655444252839722' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112655444252839722'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112655444252839722'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/09/why-mods-definitely-isnt-one-of-my-fav.html' title='Why MODS definitely isn&apos;t one of my fav-o-rite things right now'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-112629249278927321</id><published>2005-09-09T16:00:00.000-05:00</published><updated>2005-09-09T14:01:32.873-05:00</updated><title type='text'>End of 2nd week wrap-up</title><content type='html'>The ball is really starting to roll this week. I've had my first real meeting with my internship supervisor during which we discussed what I submitted for my first project.&lt;br /&gt;&lt;br /&gt;She was very impressed with how quickly and professionally I churned out the metadata mapping and she responded by giving me more work. This new project (also mapping) is a bit more ambiguous than the last one. For one thing, I'm not half as familiar with MODS (the metadata scheme I have to map to) as I am with Dublin Core. But, that's the whole point of an internship, right? I am here to learn not only the what of my chosen profession but part of the how as well.&lt;br /&gt;&lt;br /&gt;The project is very interesting nonetheless. I'm supposed to map an existing scheme in the IU Archives for their collection of documentary photography to MODS.&lt;br /&gt;&lt;br /&gt;The problem: the local scheme is focused on the physical location of the items themselves and the access points are therefore only truly understandable to the archivists themselves. There is no concept of digital resources, and so the way I see my job in this is to create an appropriate metaphor of the digital images as manifestations of the photographic expressions of the subjects pictured that must be readily available to people logging into the system from anywhere in the world. That means, of course, I must find (read: create) ready access points in a database where the closest current example is Series. Don't misunderstand me,  though, I understand how this system probably works perfectly well for the purposes of the Archives in the past and at present-day. But, my job is to try to prepare it for the future and in that future there will not likely be as much a focus on local collections as there has been or is currently.&lt;br /&gt;&lt;br /&gt;Global &lt;em&gt;is &lt;/em&gt;the word of the day, ladies and gentlemen!&lt;br /&gt;&lt;br /&gt;Looking over this, I am amazed at how I almost seem to know what I'm talking about. Just two weeks into my last semester here at school and I've started to discover that personal core of professional values that every mentor I've ever had has told me about.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-112629249278927321?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/112629249278927321/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=112629249278927321' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112629249278927321'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112629249278927321'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/09/end-of-2nd-week-wrap-up.html' title='End of 2nd week wrap-up'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-16359864.post-112593357873601454</id><published>2005-09-05T11:57:00.000-05:00</published><updated>2005-09-05T10:34:52.650-05:00</updated><title type='text'>It begins...</title><content type='html'>With the beginning of the second week of classes, I have my first project. I'm supposed to map a crosswalk from an IU metadata standard for sheet music to simple Dublin Core.&lt;br /&gt;&lt;br /&gt;But, before I can do that I have to learn just what mapping metadata entails. That's where &lt;a href="http://oai-best.comm.nsdl.org/cgi-bin/wiki.pl?CrosswalkingLogic"&gt;this &lt;/a&gt;and &lt;a href="http://www.dlib.org/dlib/december04/godby/12godby.html"&gt;this &lt;/a&gt;comes in.&lt;br /&gt;&lt;br /&gt;My first response to the local metadata scheme is that it is going to be problematic to map it into simple Dublin Core. How do you show hierarchical relationships in Dublin Core? I can show the title of the work in front of me as well as the various contributors to this work. But, what do I do with a field like "title of larger work"? It would be very easy to put something like this in dc:description and leave it at that. But, if I do that with that one element the dc:description is going to be a very confusing and highly repeatable element with multiple and varied bits of information in it. I suppose this is one of the reasons why not many people decide that they want to get involved with metadata: it's one thing to develop a scheme for your own local use, but it's quite another thing to make the stuff getting described by that scheme usable or even find-able by folks in extra-local systems.&lt;br /&gt;&lt;br /&gt;In any case, this is why I like metadata. It really is like a jigsaw puzzle and suddenly I understand why I loved those things so much when I was a kid.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/16359864-112593357873601454?l=metadataintern.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://metadataintern.blogspot.com/feeds/112593357873601454/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=16359864&amp;postID=112593357873601454' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112593357873601454'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/16359864/posts/default/112593357873601454'/><link rel='alternate' type='text/html' href='http://metadataintern.blogspot.com/2005/09/it-begins.html' title='It begins...'/><author><name>Tyler</name><uri>http://www.blogger.com/profile/01364525266510874914</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
