PolicyChinaAI & Machine Learning

China Turns High-Quality Datasets Into the Next AI Infrastructure Race

Beijing wants datasets to become usable industrial infrastructure for AI models, but the plan will test whether data supply, pricing, annotation and compliance can become investable businesses.

By Jingpost DeskPublished Jun 10, 2026, 5:10 PM UTC

Jingpost reporting.

China is trying to make the least glamorous layer of artificial intelligence into a national industrial project: data that is clean enough, labeled enough and legally usable enough to train models at scale.

The National Data Administration has issued an action plan for high-quality datasets, setting a 2028 target for verified data resources across key fields. The language is technical, but the commercial meaning is direct. Beijing is no longer treating AI as only a contest of model releases, cloud capacity or chip supply. It is trying to organize the raw material that determines whether those models can work inside factories, laboratories, transport systems and public services.

That shift matters for investors because datasets are not just files. They require collection rights, cleaning, annotation, quality control, security treatment, pricing and repeat customers. A model developer can buy chips once and rent compute monthly, but it needs useful data continuously. If China can turn industrial and public-sector data into a more tradable input, a new layer of companies may form around annotation, data exchanges, domain-specific data products, compliance services and model evaluation.

The plan identifies sectors including scientific research, industrial manufacturing, low-altitude economy and embodied intelligence. Those choices say a great deal about Beijing's priorities. It wants AI to move beyond consumer chatbots and into production systems where better data could reduce defects, shorten design cycles, train robots, support autonomous equipment and make state-backed industrial policy easier to measure.

The problem is that high-quality data is expensive because it is not merely abundant. Factories have machine data, but it is often messy, private, inconsistent or trapped inside vendor systems. Hospitals and laboratories have valuable records, but privacy and professional standards make them difficult to commercialize. Local governments have administrative data, but quality and interoperability differ sharply across provinces. The plan therefore points to a market that may grow slowly even under strong policy pressure.

Data annotation is one early test. China has already encouraged national data annotation bases, and the industry is being pushed toward higher growth. Yet annotation is not a commodity task if the aim is industrial AI. A medical image, a factory defect record or a robotics motion sample may require expert judgment. That raises labor cost and makes quality assurance more important than headcount.

The financing language is equally revealing. The plan discusses commercialization and assetization of datasets, including pledge financing, equity contribution, asset-backed securities, data trusts and insurance. This is where ambition meets accounting. If banks and investors are asked to treat datasets as collateral or productive assets, they will need evidence of ownership, durability, buyer demand and legal enforceability. Without that, data assets can become another policy label that looks valuable on paper but is hard to underwrite.

For technology companies, the opportunity is not simply selling more software to the government. It is building tools that make data usable across industries: cleaning engines, labeling workflows, synthetic data controls, privacy-preserving computation, model-evaluation services and governance systems. The companies that win may be those that sit between domain owners and model builders, not necessarily the most visible consumer AI brands.

The risk is fragmentation. If every region, ministry and enterprise builds its own data format, the market will produce many pilots but few scalable products. If pricing is administrative rather than demand-based, private firms may struggle to build margins. If compliance rules are vague, buyers may avoid using valuable data in model training. The plan's real test is therefore not whether China can produce more datasets. It is whether those datasets become trusted enough for companies to buy, finance and reuse.

China's AI competition is often described through chips and large models. The dataset plan shows a quieter battlefield. In the next phase, the advantage may belong to the institutions that can convert messy operational records into lawful, priced and machine-readable assets. That is a harder story than model hype, but it is closer to how AI becomes an economy.

More from this story