The XML-Files case 002: Content Duplication and Ghosts

Paul Movileanu Paul Movileanu Senior Inside Sales and Account Manager 09 Apr 2025 10 minutes 10 minutes
The XML Files Blog Case 1
In this second episode of our series, inspired by agents Mulder and Scully, we continue to delve into the mysteries of the content management space. Like the federal agents, we tirelessly explore phenomena that defy normal logic and human reason. Dark unstructured forces attempt to hinder our efforts to illuminate these paranormal content phenomena, but we always manage to overcome them. After investigating why many companies lack a structured content management strategy, a twisting trail of clues led us to another great mystery: the connection between content duplication and ghosts.

The Second Mystery: Content Duplication and Ghosts

We've all heard horror stories about otherworldly creatures haunting antique castles and driving their inhabitants mad. These creatures, often referred to as ghosts, are believed to be the souls of humans who have experienced some curse or misfortune and are stuck between worlds, unable to find peace. Ghosts appear in different places at different times, with no apparent connection between sightings. You never know if it's the same ghost or a different one.
 
A similar situation exists in the realm of content management. Ghosts don't just dwell in isolated castles; they also haunt the modern digital world. Content is typically created within the boundaries of a specific department or team, with a writer crafting a text based on certain data and for a particular purpose and audience. This content is stored as a file on an intranet or internal file-based storage system. But when you need to repurpose that content for a different goal, team, audience, or context, copy/pasting seems the easiest way to achieve the new content piece. You access the old file, copy/paste the text into a new file, and voila, there's your new piece of content. Sometimes it's exactly the same text, sometimes it's just minor edits, and other times you need to include the text in a larger piece. Practically, there's a new instance of the same text or an updated version, stored again as a new file and published via some publication channel.
 
So, you have a piece of text that behaves like a ghost: it lives in a siloed world where it was initially meant to be, but then it keeps appearing and disappearing in other places. You can think of a content ghost as a piece of content that no one knows exists; it shows up every now and then, floating around, with nothing linking to it; it's not really anchored in the material world. Due to its immaterial and fleeting nature, you cannot find it when you need it, and there is no semantic or referential connection to track it down. Hence the necessity to recreate a new piece of content with the same siloed parameters. A new ghost, if you will. This is how ghost sightings multiply, leading to multiple pieces of isolated, unreferenced, unused content. You see the ghosts, you feel spooked. You read duplicated, siloed content, and it makes sense, yet you often feel that something is off.
 
Content duplication relies heavily on the copy/pasting technique. First, you need to find that piece of content to copy/paste into a new document. This is a manual and time-consuming task, which, in enterprise settings with thousands of documents and pages, can prove challenging. While copy/pasting is useful for certain tasks, it's important to use it responsibly and realize that it is not the right tool for content management at scale in a governed, compliant way. You can copy/paste when working on a document and need to replicate a few lines, a paragraph, or a table. You can copy/paste in specific low-level tactical instances where you need a quick fix to produce content. However, if the number of tactical copy/paste situations increases, you find yourself at odds with the efficiency principles and best practices of content management, such as COPE.
 
Seeing a ghost once or twice can be eerie and even fun. But if hauntings multiply, the fun factor disappears, and you start wondering what's going on. It's the same with using copy/paste as the main tactic for content creation and management. Once the ghosts multiply, you can no longer make sense of the random apparitions, you cannot find the right information at the right time in the multitude of documents, it's increasingly difficult to do updates, variations, and versions, and the whole content management process becomes a mess. You start realizing you are in the midst of a haunting predicament, a mystery if you will.

A Case Study: Haunted Software Companies

Software companies have thrived over the last decades. The mass use of information technology devices and the internet has led to massive growth for companies providing the less material aspect of our technology-driven society. This also means that these companies have increasingly complex requirements for their content management process. Software companies typically need to manage a lot of digital content. They need to:
  • Create online help systems to offer support and training to end users of their software
  • Create online knowledge bases where frequently encountered problems are described and solved
  • Publish technical documentation around the development and quality assurance of their products
  • Create product information and marketing collateral that aligns with their business goals and the technical features of their products
  • Manage online forums, user communities, and intranets where technical and usage information is exchanged
Many of these companies use tools like Jira (software development and issue tracking tool), Github (version control and collaborative software development tool), and Sharepoint (document management and storage web-based collaboration platform) to create and manage technical content. While these are great tools for their main use cases and easier to onboard solutions initially, using them for enterprise content management at scale can prove ominous.
 
Content process efficiency is often an afterthought with these tools, especially when it comes to scalability, governance, collaboration, reuse, AI options, and multi-channel delivery. Trying to improve the process with the same kind of unstructured tools often turns out to be a futile exercise, similar to chasing ghosts or starting a witch hunt. These are some of the challenges frequently heard from companies serious about overhauling their content management process:
  • They lack real control and visibility over the end-to-end process
  • Authoring is done in a siloed way with no governance or auditing systems in place
  • Collaboration is difficult to achieve in real-time, typically done via emails, MS Word, or Google Docs sharing
  • It's difficult to bring SMEs into the review and approval process
  • Automated reuse is not baked into the fabric of the process from the beginning, so writers turn to manual methods like copy/pasting or rewriting whole documents
  • Implementing semantic capabilities and GenAI tools is challenging as these require structure and metadata as foundational layers to the content
  • Multi-channel and multi-endpoint publishing is almost impossible to scale and automate as workflows are manual and linear
 
The phrase "the ghost in the machine" refers to the possibility of a system or machine developing a mind of its own and behaving erratically. In this case, multiple ghosts haunt the content management machine at multiple points of the process, turning it into a juxtaposition of disjointed efforts and tasks, resulting in inefficiency and lack of productivity. But, just like ghosts can be tracked down and sent to the realm of spirits, content management problems can be solved with a structured content approach and the use of a Component Content Management System (CCMS).

Mystery solved

Ghosts can be exorcised and put to rest, and content management problems can be fixed. Ghosts are rare occurrences, but content management problems are frequent and easily noticeable in the business world. It shouldn't be a mystery how to fix content problems like out-of-control duplication, extensive use of copy/paste in content creation, lack of visibility into the process, and waste of resources with the implementation of a structured content strategy based on the use of a CCMS. How can a CCMS help exorcise some of these problems?
 
Content duplication issues arise when:
  • You don't have a central repository for your content, a single source of truth to turn to: with a CCMS like Tridion Docs, check
  • You have a central content repository, but it's document-based and not component-based, making it harder to find existing information and do automated content reuse: with a CCMS like Tridion Docs, check
  • You have a central repository and a team of writers, but no clear strategy or process, no governance: with a CCMS like Tridion Docs, check
  • You have a repository and a process, but lack modern, automated, AI-based tools for semantic tagging, taxonomy-based categorization, semantic and granular search, and findability: with a CCMS like Tridion Docs, check
  • You have all the above, but lack a modern delivery platform, and your main publishing format is still PDF or Word documents, making it hard for content consumers to have a modern experience: with a CCMS like Tridion Docs, check
In conclusion, don't wait for ghost sightings to multiply. Start looking at how a CCMS can bring efficiency and modern capabilities to your content and documentation operations.
Paul Movileanu
Author

Paul Movileanu

Senior Inside Sales and Account Manager
Paul is currently managing inbound requests for RWS Tridion CMS and helping prospective customers in their journey towards adopting structured content and CCMS software. With a background in translation and foreign languages, he has extensive experience working for RWS at the intersection between sales, marketing, and business development.
All from Paul Movileanu