Vivold Consulting

OpenAI solicits real task files from contractors to benchmark and train next-gen AI agents

Key Insights

OpenAI has asked contractors to upload real past job deliverables to help it evaluate next-generation AI agents against human performance. Workers must anonymize sensitive information before uploading. The initiative is part of establishing benchmarks for real-world task capabilities.

Stay Updated

Get the latest insights delivered to your inbox

Benchmarking AI against real human work


OpenAI is quietly collecting actual deliverables from contractors' past jobsWord docs, spreadsheets, PDFsto create realistic benchmarks for how well its upcoming AI agents can handle complex, real office tasks. This isn't about synthetic datasets; it's about grounding evaluation in real professional work.

What's unusual about this project


- Contractors are being asked to upload actual work artifacts, not summaries, with personally identifiable information redacteda shift from simulated or artificial evaluation sets.
- The goal is to compare AI agent performance to human baseline outputs across a spectrum of real tasks, from analysis to creation.

Why this matters for developers and businesses


Benchmarking against real work could yield harder performance targets and clearer signals about where AI still lags human professionals. For enterprises evaluating AI agents for automation, these metrics could be decisive in procurement and deployment decisions. But it raises questions about data privacy, consent, and corporate IP boundaries in training and evaluation workflows.

Related Articles

Salesforce Unveils AI-Powered Slack Makeover with 30 New Features

Salesforce has announced a major update to Slack, introducing over 30 new AI-driven features aimed at enhancing workplace productivity and collaboration. Key enhancements include: - Advanced Slackbot capabilities for drafting content, summarizing conversations, and answering queries. - Integration with Salesforce CRM and third-party apps to provide context-aware assistance. - Proactive recommendations during video calls, such as surfacing relevant Salesforce records when key names are mentioned.

Salesforce Ramps Up Agentic AI Research with New Foundry Project

Salesforce has launched the AI Foundry, a new initiative aimed at accelerating agentic AI research and development. The project focuses on: - Bridging foundational research and product innovation through collaboration with strategic customers and academic partners. - Developing AI tools for high-impact enterprise areas, including simulated environments for testing AI agents and enhancing solutions like Agentforce Voice. - Exploring ambient intelligence to provide proactive, context-aware assistance without constant user input.

VHA Deploys Salesforce-Powered Agentic Operating System, Saving Thousands of Staff Hours for Front-Line Veteran Care

The Veterans Health Administration (VHA) has implemented a Salesforce-powered agentic operating system, resulting in significant operational efficiencies. Key outcomes include: - Transitioning from static reporting to automated problem-solving, eliminating administrative silos. - Freeing thousands of staff hours, allowing more focus on direct Veteran support. - Creating a connected performance management layer, enhancing care delivery across facilities.