One of the most valuable concepts in the world of data and analytics is making data "readable" by both machines and humans. Data that is only human-readable is great for cartoons, and endless lines of code works just fine for making software work, but the real magic happens when computers and humans get on the same page.
One of the best ways to make information in data sets transparent to humans and machines is with XSLT (eXtensible Stylesheet Language Transformations). It's kind of like what happens behind the scenes to make websites readable. If your're curious, hit the F12 key on your keyboard to see all the code behind this article.
If we can do the same thing as a website - take a whole lot of hard to understand data and present it in a way that's easy to understand - we will turn more data into actionable and valuable information.
The code that makes websites work is called HTML and the code behind many electronic documents (including Microsoft Office) is called XML (and actually XML is used on websites too...). But the purpose of XML is to separate information from the presentation layer. Basically, it is a language that explicitly defines what each data element is. This is very different than HTML.
So in HTML an <h2> tag is used to format text, but it doesn't say what that text is. XML specifies what the text is. For example, <productCategory title="OCR Software for Oil and Gas"> defines exactly what the text represents.
Just like there are standards with website code, there are also standards for the way XML is formatted. There are specially defined characters, markup rules, tags, elements, attributes, etc. that all help to bring data to life. You can learn more about XML here.
Essentially, XML formats data so there can be no question about the information it represents. Once data is in an XML format it's no longer locked inside a proprietary computer language.
Here's where XSLT comes into play. Because XML is designed to separate data from the presentation of the information, it needs a bit of processing to become more transparent. Take a look at some XML code for a document below:
Example XML Code
<Document Id="94b9a646-f926-4a7e-9df3-3f70d88fcd80" Name="Generic Invoice (1)" TypeId="22684b3b-e266-4350-bfdb-96bf90bae207" TypeName="Acme">
<Field Name="Invoice Number" Confidence="1.00" Page="1" Valid="True" Location="5.103, 7.030, 1.927, 0.093">74449788</Field>
XSLT is a programming language used to transform that code into beautiful PDF documents, web pages, plain text, or even printer friendly PostScrip format. And the great thing is that the information can be arranged in virtually any desired layout.
Information from multiple XML "documents" can even be combined into a single final document. This aids in the creation of dashboards and reports which must pull data from more than one source. Because XML is so precisely formatted it acts as a sort of "single version of the truth." The data contained in XML can be used to match and integrate information from external databases or document repositories.
One of the easiest ways to think about XSLT is that it's simply a template for displaying specific information from more complex XML data sets. The first step is getting data into XML format. Then, with a tool that performs XSLT transformations, create documents with accurate data laid out in any given format or structure.