Strategic spending: Optimizing your AI data budget for generative AI success

Lou Salmen Senior Business Development Director, TrainAI 13 Feb 2024

5 minute read

When shopping for AI data services, you might ask yourself the following questions: How much does the data, content or feedback required to train or fine-tune generative AI cost? What issues can cause costs to increase? And how can those issues be overcome?

It can be difficult to understand the costs associated with training and fine-tuning generative AI. Pricing models often vary wildly with some fees baked in, and others completely hidden. Furthermore, required spends and volumes are often completely unknown upfront, making it difficult to get the best price possible amidst the uncertainty.

Generative AI training and fine-tuning demand different resources with greater specialization. The power it brings is immense, as are the risks. Here are four key factors to consider when building your generative AI training budget.

Resource compensation

As a first step, it’s important to understand what data tasks you’re asking workers to do. Are they providing feedback on how well-written a response is? Are they evaluating news posts about current events in Australia? Are they checking responses about heart valve surgery for factual accuracy?

The tasks that resources are asked to perform directly impacts resource compensation. The more complex the task, the greater the compensation required – providing significant insight into potential project costs.

Even for the simplest of tasks, workers must be paid a fair wage for their work. This means paying at least minimum wage in countries that have an established minimum wage. But keep in mind that, in many countries, minimum wage does not equate to a livable wage, so paying more than minimum wage may be the right and ethical thing to do.

Now consider, for example, the type of resource needed to label machine parts of a rocket ship. With fewer workers with this unique expertise available, hourly compensation for such a resource will be significantly above average. Don’t assume all tasks are easy or the same.

Other questions around resource compensation to consider include:

Where do resources need to be located?
What language(s) should they speak?
Do they need to specialize or have expertise in a particular domain?
How many total resources will be needed?

Your answers to each of the above questions will also impact the total resource compensation required for your generative AI training or fine-tuning project.

Project ramp-up

Generative AI processes are still being figured out. It’s not uncommon to work through entire datasets only to discover that none of it is usable due to lack of awareness or miscommunication between your AI team, your AI data services provider and the resources working on your project.

Take the time upfront to work with your provider to develop an accurate and comprehensive resource testing and training strategy. Many projects that don’t do this upfront are forced to use higher-rate resources on the project simply because insufficient time was allocated to evaluating and aligning the most cost-effective resources.

A typical project ramp-up will include a pilot project. This is a critical step in validating assumptions made throughout the project estimation process. A sample of the overall dataset will help you validate the accuracy of estimated costs, ensure that the right resources are being used and the output quality is appropriate. Using insights gained during a pilot project, your AI data services provider can often provide per-unit pricing to help you more accurately calculate costs and maximize your budget AI data services processes.

Consider an AI data services process where one resource is performing a task on one data point. This process is typically referred to as ‘single annotation’. Now consider a process where two resources are performing the same task on the same data point. This is referred to as a ‘consensus approach’ and is double the cost. This seems logical and straightforward, but many AI teams fail to connect the process they need to the effort required.

Quality assurance (QA) is another required process that many fail to account for when budgeting. If data services vendor payment is contingent on them delivering 100% data accuracy, your vendor will have to perform QA on virtually all of the data before it is delivered. By switching to a partner-based approach where you collaborate with your data services provider to establish a practical QA objective and framework based on task complexity and project budget, you can cut costs while still receiving the quality data you need to ensure generative AI project success.

Technology tools

Many companies have developed their own technology tools to manage the process of training or fine-tuning their generative AI, and it’s become the norm for AI data providers to work with those tools. However, despite being well built, these tools are often focused primarily on the client’s needs without considering the data provider’s processes, which means providers must use alternative, more manual processes to complete their work.

For example, a client’s system might not allow an AI data services provider to perform QA on a portion of tasks after they’ve been submitted. Ensuring data quality without the ability to perform QA in the client’s system requires additional upfront testing and training of resources outside of the client’s system, which come at an additional cost. Instead of being tested and trained on real data that you can use immediately, resources must work on fake tasks without real data outputs that have additional costs associated with them. Requiring your AI data services provider to use your proprietary tool is acceptable, but it's important to consider how their process might be affected, and how your project budget may be impacted as a result.

Set the right budget and accelerate your journey to generative AI success

Budgeting for training or fine-tuning generative AI can be challenging. But careful consideration of resource compensation, project ramp-up, data services processes and technology tools will enable you to estimate your project costs more accurately. As you navigate the rapidly evolving AI landscape, developing a thorough understanding of and planning for each of these considerations will contribute to a successful outcome for your generative AI project.

Not sure how to source the AI data you need to train and fine-tune your AI model? Download our AI data sourcing decision tree to get started.

Lou Salmen

Senior Business Development Director, TrainAI

Lou is Senior Business Development Director of RWS’s TrainAI data services practice, which delivers complex, cutting-edge AI training data solutions to global clients operating across a broad range of industries. He works closely with the TrainAI team and clients to ensure their AI projects exceed expectations.

Lou has more than 15 years’ experience working in sales and business development roles in the AI, translation, localization, IT, and advertising sectors. He holds a bachelor’s degree in Entrepreneurship/Entrepreneurial Studies from University of St. Thomas in St. Paul, Minnesota.

Connect with Lou on LinkedIn.

All from Lou Salmen