Baby steps towards knowing XSLT even better...
This morning I worked on an XSLT stylesheet to transform a slew of Excel spreadsheets into a collection of MODS records. This itself is no difficult task; transforming anything to MODS metadata is practically second nature at this point in the semester and, I think that transforming Excel spreadsheets into something as simple as MODS ought to be a couple hours on Tuesday afternoon job. However, in this case, there are a couple of complications that will slow things down a bit.
The most important thing is the state of the data that has been recieved. Out of 35 individual spreadsheets for approximately that many individual Nobel Peace Prize winners (and a couple of random collection of photographs just to spice things up), there are 6 different description schemes with some interchangable elements, some unique elements in each one of the schemes. The other, much more problematic, obstacle against easy transformation is that while the spreadsheets are human-readable (with all sorts of nifty spacing, padding and big bold titles), all of this human-readability makes it impossible to be machine-readable. This adds an extra step to making it friendly; I have to go through all 35 Excel workbooks and break it all up into separate worksheets and eliminate all those extra spaces that mess with the XML export.
I suppose this is the big lesson of this project. I ought to just expect messy data. The second lesson is that messy data is why things like <xsl:if> and <xsl:choose> exist. Of course, there's a myriad other things to do with these wonderfully diverse tools (the hammer of the XSL language), but for what I'll be doing with them they exist to deal with messy original data.
The most important thing is the state of the data that has been recieved. Out of 35 individual spreadsheets for approximately that many individual Nobel Peace Prize winners (and a couple of random collection of photographs just to spice things up), there are 6 different description schemes with some interchangable elements, some unique elements in each one of the schemes. The other, much more problematic, obstacle against easy transformation is that while the spreadsheets are human-readable (with all sorts of nifty spacing, padding and big bold titles), all of this human-readability makes it impossible to be machine-readable. This adds an extra step to making it friendly; I have to go through all 35 Excel workbooks and break it all up into separate worksheets and eliminate all those extra spaces that mess with the XML export.
I suppose this is the big lesson of this project. I ought to just expect messy data. The second lesson is that messy data is why things like <xsl:if> and <xsl:choose> exist. Of course, there's a myriad other things to do with these wonderfully diverse tools (the hammer of the XSL language), but for what I'll be doing with them they exist to deal with messy original data.

0 Comments:
Post a Comment
<< Home