The XML-Files case 003: Unidentified Semantic Objects in Area 51
06 May 2025
6 minutes


This third installment of our series brings us to the lion’s den, the place which holds the answers to many of the mysteries investigated by our predecessors Mulder and Scully. This is a secluded place where secretive experiments occur far away from the public eye and where the occasional intruder faces extreme dangers in the face of the unknown and the unstructured. Following in the footsteps of the federal agents, we managed to break into this highly secretive government facility and discovered in awe that the rumors are true: we could see the aliens with our own eyes! The aliens we saw however look nothing like ET, but take the shape of content objects which lack semantic structure and tagging. What follows is an accurate report of our encounter with these peculiar content beings within the confines of Area 51.
The Third Mystery: Unidentified Semantic Objects in Area 51
After managing to get inside of Area 51, hopping between barbed wire, armed guards and infrared security cameras, we found ourselves in front of the main facility: a huge, siloed building on whose facade it was clearly spelled out in Gothic fonts: ENTERPRISE CONTENT MANAGEMENT. Merely the sight of it was enough to send shivers down our spine! The building was just what you would expect of such a place:
- Run-down, siloed, unconnected
- People running around to complete their work tasks in an incoherent, uncoordinated manner
- Lacking basic automation and reuse capabilities
But what really stood out was waiting for us inside those gloomy walls. Hovering about the giant hallways, flotating high above the ground, we could see pieces of content that lacked any semantic tagging and reference. These unidentified semantic objects (or USOs) floated around in the vast expanse of corporate knowledge, much like UFOs in the night sky, eluding identification and proper management. Just as Mulder and Scully seeked to uncover the truth behind UFO sightings, businesses need a solution to identify, manage, and reuse these elusive pieces of content. Unidentified semantic objects are content pieces that lack proper tagging and reference. They are the UFOs of the corporate world, causing confusion and inefficiency. Without semantic tagging, these objects remain unidentifiable, making it difficult to find, manage, and reuse valuable information. This leads to wasted time, duplicated efforts, and missed opportunities.
There is more however. Sneaking further into the secretive government building, we ended up in a room full of paper documents with the Confidential stamp on them. These seemed to be in a language that none of us had seen before, but luckily for us, with the help of our RWS Language Weaver neural machine translation tool, we were able to instantly decode and understand the documents. What we had suspected all along proved to be true: there is a conspiracy among unstructured content tool makers to hide the truth about the benefits of semantically-structured content. These tools perpetuate the chaos by failing to provide the necessary structure and tagging, leading businesses to believe that unstructured content is the norm. However, we, the feds at RWS Tridion Docs, always strive to reveal the truth: semantically-structured content can significantly enhance the efficient and automated creation, management, and publication of technical content in business settings. The agents of these unstructured content tool makers spotted us and tried to take us down but we managed to elude them by using the following powerful weapons.
Mystery solved: structure down those USOs!
Luckily for us, as we were fighting against the malevolent guards, state agents and other less structured groups, we knew that we had powerful weapons at our disposal from what the Tridion Docs CCMS has to offer. These weapons are semantic tagging, taxonomies, ontologies, knowledge graphs, AI, and we used them to push back our enemies and bring down those troublesome USOs. Here’s how these weapons function.
Semantic tagging is the first and most basic weapon you can use against USOs. It does wonders against these unstructured content alien beings and allows you to categorize and label pieces of content according to predefined semantic categories. Traditional search & find functions on the basis of matching keywords. This is not enough anymore in today’s business world, where knowledge workers need quick access to conceptual or procedural information rather than to keyword-based documents. Semantic tagging can be fired away either manually or automatically or in a hybrid human-in-the-loop mode.
The use of the semantic tagging weapon is closely linked to that of taxonomies. USOs shouldn’t exist in a company’s content because all of it should be underlined by a semantic classification model, or a corporate taxonomy. Taxonomies apply structure to content components and allow you to establish semantic connections between them. Taxonomies can be built from scratch by taxonomists or other linguistically skilled employees or consultants, but there is also the possibility of adopting an external sectorial-focused taxonomy from the providers that offer such products, i.e. pharma taxonomy, insurance taxonomy, etc.
Ontologies is the next level weapon when it comes to fighting lack of structure and USOs. Like taxonomies, they are a categorization framework which enables you to go further with the mapping of the relations between entities with benefits in areas such as knowledge management, data integration, semantic search and use of AI and ML tools. Having a corporate ontology in the company means you can start thinking about using advanced warfare techniques against the green USO invaders, such as knowledge graphs.
A knowledge graph is the equivalent of a nuclear bomb in our continuous fight against the evil unstructured content forces. As a visual representation of the network of the various entities (objects, concepts, events, states) existing in a domain or knowledge area, knowledge graphs impose structure and semantic relationships on what was previously just a pool of unstructured, unrelated content. We used these powerful tools to wreak havoc on the unstructured and evil government troops that were threatening us with their unstructured content explosions. And so, to be able to escape the danger, we had to turn to the most talked about weapon on earth: AI.
AI is the cannon that RWS Tridion agents use to shoot ballistic content missiles into the enemy camp on condition that it is trained on structured, semantically well-formed content. AI can be used as an assistant to review and improve the authoring process, can be used in a RAG-based chatbot environment to connect users to the internal knowledge base, can be used to navigate content in a semantic, visual manner through Hexahops. AI can lead to hallucinations and fake content, but, when it is used by the highly competent RWS Tridion Docs agents and trained on structured, DITA XML content, it is a highly effective and devastating content weapon.
The End
Just as Mulder and Scully attempted to uncover the truth behind UFOs, businesses must seek out structured content solutions to reveal and solve the mysteries of content management. By adopting a structured content approach with the implementation of a CCMS like RWS Tridion Docs, businesses can identify, manage, and reuse unidentified semantic objects, bringing order to the chaos and unlocking the full potential of their content. The content management truth is out there, and Tridion Docs is here to help you find it.