20 must-ask questions when evaluating AI data services providers

Tomáš Burkert 25 Jan 2024 6 minute read
The pressure to explore, develop and productize artificial intelligence (AI) and machine learning (ML) has never been greater than it is today. Many data scientists, ML engineers and product managers feel growing pressure to build new models or expand the capabilities of existing ML models within their organizations.
 
This pressure can be especially palpable when in-house data acquisition (whether it involves data collection, data annotation or both) becomes untenable, forcing organizations to seek the help of a professional data services provider to provide AI training data that incorporates the cultural and market knowledge or subject matter expertise required at scale.

Trusting a third-party AI data services provider with your projects and data can be a little daunting. To help, we’ve prepared 20 must-ask questions to ensure you select the best AI data services provider for your project.

1. Does the AI data services provider have experience and expertise with the data workflows and data types I need?
 
A provider could be excellent at computer vision but may lack expertise in natural language processing (NLP) or large-scale data collection. Always make sure their experience is relevant to your goals.
 
2. Does the provider have strong quality assurance processes that can be adapted to my project goals and ensure high-quality data and annotations?
 
The right candidate should have strong principles and processes covering data consistency, alignment of workers, sampling methodologies and quality assurance in general.
 
3. Does the AI data services provider have access to technology that will improve the quality and accelerate the delivery of my training data?
 
Platforms and tools built for efficient data collection and annotation are essential to ensure the quick turnaround of high-quality deliverables. 
 
4. How much does the data services provider know about their community or crowd? Are they able to deploy the right workers for my AI training data needs?
 
No amount of training and instructions are going to help if the data workers assigned to your project are not a good fit for the given domain, task type or general task complexity. 
 
5. What are the standard sourcing, qualification and training processes that the provider uses, and are they flexible enough to support my use case?
 
Sourcing and onboarding the right resources for your project can be challenging, which is why the way your data services provider goes about it should instill trust in you and your team.
 
6. What does the provider say and do to ensure their workers are treated fairly and ethically? And are there any indications that their claims might not reflect reality?
 
There has been much public outrage and even lawsuits related to the unethical treatment of AI data services workers. Carefully scrutinize your provider’s operations and practices to make sure you’re not indirectly contributing to worker exploitation.
 
7. How diverse is the provider’s community? Can I expect a highly diverse team of workers to be assigned to my project, especially if I have specific demographic requirements?
 
One of the primary concerns when acquiring AI training data for machine learning is minimizing the bias and blind spots that a homogeneous worker pool can introduce.
 
8. Can the AI data services provider quickly and reliably scale if my project grows significantly?
 
Since one of the most common reasons to engage with a data services provider is the inability to scale internally, it goes without saying that scalability is vital. Check that your provider’s technology, infrastructure and ability to deliver can scale as your needs grow.
 
9. Can I trust the provider with my data, confident in the knowledge that they will always follow all applicable laws and privacy and security best practices?
 
While this won’t be an issue with most providers, you should watch out for bad apples since any lapses in this area can have serious financial, legal and reputational consequences.
 
10. Is the AI data services provider able to provide me with a clear and realistic timeline and turnaround time for my project?
 
The inability to provide a project timeline is a red flag. But beware of overly optimistic or outright unrealistic timelines or turnaround times as well.
 
11. How will I be informed of the progress of the project? Does the provider have a strong project management team and good communications practices?
 
You’ll want to make sure that you’ll always have good visibility into what’s happening with your data and the progress being made on your project. 
 
12. Does the AI data services provider approach the partnership in a consultative manner? Will they be able to support me if there are changes to the project that require creative solutions?
 
It’s not uncommon, especially when building new AI models, that adjustments or even complete pivots to your training data needs may be required. Your data services provider should be a valued partner, not a hindrance, in managing those changes.
 
13. Is the provider asking the right questions? Are they proactively suggesting alternative ways to achieve my goals and trying to improve processes? 
 
The best AI data services providers will proactively work with you to optimize workflows and share their expertise with you.
 
14. What are the provider’s capabilities to support specialized workflows involving sensitive data such as personally identifiable information (PII) or protected health information (PHI)?
 
If your project deals with sensitive data, knowing where your data will reside and who will have access to it is critical.
 
15. Can the provider adapt to my processes and tooling, and tailor the solution to my needs?
 
There’s nothing worse than an AI data services provider trying to squeeze you into their cookie-cutter processes and tooling. Double check that your provider is flexible and accommodating.
 
16. Is it possible to run a quick pilot project to test data task design and validate whether the project is set up correctly?
 
Both parties will benefit from a pilot run (or proof of concept) – doubly so for complex projects.
 
17. Is the pricing that I’m being presented with transparent and predictable, without any unknown factors or fine print?
 
The simpler the better! Know what you’re paying for.
 
18. Does the AI data services provider offer services specific to generative AI including large language models (LLMs) and large multimodal models (LMMs)?
 
This may indicate whether they are staying on top of recent industry developments and evolving to address emerging client needs.
 
19. If the AI model will be rolled out in multiple markets or languages, how strong is the provider’s geographic reach, their community and knowledge of local markets? Will they struggle to support me in reaching my customers globally?
 
AI has the potential to break down language and cultural barriers. If you’re aiming to support multiple markets with the objective of being truly global, choose an AI data services provider that can support you equally well on Italian, Icelandic and Igbo. 
 
20. Can the AI data services provider support me on AI-related tasks adjacent to data collection and annotation?
 
Whether it’s consulting, data preparation, instructional design dataset validation, prompt engineering, reinforcement learning from human feedback (RLHF), red teaming or locale-specific support, being able to rely on your existing provider to meet your additional needs can save you time, budget and potential headaches.
 
Evaluating AI data vendors to train or fine-tune your generative AI? Download these questions in a handy question-answer checklist and get your project off to the right start.
TomášBurkert
Author

Tomáš Burkert

TrainAI Solutions Consultant, RWS
Tomáš is a Solutions Consultant in RWS's TrainAI data services practice, which delivers complex, cutting-edge AI training data solutions to global clients operating across a broad range of industries. His mission is to understand even the most complex client needs and work with the TrainAI team to successfully design, execute and deliver a wide range of AI data services projects.
 
Tomáš has over a decade of experience in localization and several years of experience serving major big tech clients in the AI data services space. He holds a master's degree in English Language Translation from the Masaryk University in Brno, Czechia.
All from Tomáš Burkert