In continuation with the Tridion’s webinar held recently, this is the second in a series of the blog ‘Structured Content Authoring and Semantic AI’. The first blog discussed the need for structured content and semantic AI.
Jan Benedictus from RWS’s Fonto division discussed how traditional content could be converted into structured content using semantics. And then deploying artificial intelligence (AI) to deliver the par excellence user experience that businesses aim for but are unable to.
To recap, Jan brings about 30 years of experience in the online and digital publishing domain. He founded Fonto in 2014 to make structured content authoring available to everyone. RWS acquired Fonto in March 2022 to bring its expertise and user-friendly structured content creation, editing, and review tools to RWS and Tridion customers.
Characteristics of structured content
Starting with the basics, Jan defined the key characteristics of structured content. He contrasted structured content with traditional content, e.g., structured content is a collection of content assets instead of monolithic documents.
These characteristics are key to structured content, ideal to reuse and repurpose across different formats and devices.
Unstructured content vs. structured content – key differences
Unstructured content consists of lines of content that may or may not have styling. Jan mentioned that several authors are very organized and style their content using titles, headings, list items, and normal/body text formats. These typically include standard templates with standard font sizes, font colors, and line spacing.
Such styling or formatted content makes it easy for the user to browse through and digest the content. However, this still is considered not structured content; as defined above, structured is not about styling but semantic tagging. Without such tagging, the content doesn’t become reusable or discoverable and can’t be repurposed.
Styled content illustration:
# Semantic tagging and metadata add hierarchy and turn unstructured content into structured content. Jan shared an illustration of structured content with a visible hierarchy. In the case of a document, the hierarchy begins from the outer box (document) and then narrows down to a section, paragraph, block, and inline.
A structured content illustration:
As seen in the above illustration, the content is nested into components. This adds a standard and consistent structure to the content. Thus, by simply looking at the design or underlying XML, one can figure out the relation, i.e., which content is a part of (nested into) which content. And hence the term structured content.
In the above illustration, the document is no longer just how it was; adding structure turns it into a collection of content assets, which can also be called components.
Content assets in a document illustration
As apparent, these content assets or components are granular and hence are extremely valuable to the business. The users can easily find/discover, reuse, and repurpose it. E.g., internal users (employees) can quickly and easily locate the content, saving them time and effort. They don’t need to rewrite the existing content and can easily repurpose or tailor it to their specific objective or application.
External users (customers, vendors, shareholders, etc.) can easily find the information they want to make decisions. It helps businesses to add transparency, build trust, and deliver a rich experience that external users need to remain loyal.
In short, structured content creates a win-win situation for all the business stakeholders, helping drive revenue and profitability in a highly constructive and efficient way.
Adding semantic tags to convert content into data assets
Jan explained that while metadata does add structure to the content by describing the attributes of the content, it is easier to accomplish when it is in the form of tabular content. Adding structure becomes challenging when the content is in free-form / text format.
To overcome this, Jan pointed back to the hierarchical way of structuring the content where there are components, i.e., titles, sections, list items, etc. In semantic tagging, each of these components is tagged. He shared a healthcare content example where the sentence ‘Wear a medical mask’ is tagged as “Warning”, while a product name ‘Aspirin’ is tagged as “Product”.
Semantically tagging each unit:
To take this tagging to the next level, Jan added a reference to the “unified definition” of that warning or reference to the product database to which Aspirin belongs. This adds relation and context to the data not just at the organizational level but also at a broader database level, thus empowering authorized users from anywhere in the world to quickly access and reuse the information they want.
Other benefits include that it reduces the risk of error as the content is linked to the database and the user can find more information on the data through more detailed information in the database.
Semantic tagging converts content into data
Using data assets to generate custom content across formats/devices/languages
The whole point of having digital assets is to use them to generate custom content that businesses want for their users. Jan explained this with an example of generating a word document from the data assets. It is as simple as converting the semantics into a style that we want. Using the same example above, e.g., data in the tag can be put into a box with a frame.
The conversion can include conditions to tailor content for any given purpose, e.g., to the country or the audience. Thus generating multiple outputs from the same set of underlying data assets.