NextFin News - On January 28, 2026, Google Cloud announced the launch of third-party generative AI inference for open models within BigQuery, a move that fundamentally alters how enterprises interact with large-scale machine learning. According to InfoQ, this new capability allows data teams to deploy and execute models from Hugging Face and Vertex AI Model Garden using plain SQL statements, effectively turning the data warehouse into a self-contained AI engine. The feature, currently in preview, automates the entire lifecycle of ML infrastructure—from provisioning compute resources to managing endpoints and cleaning up idle instances—removing the traditional requirement for specialized Kubernetes clusters or complex API integrations.
The technical implementation is designed for simplicity and cost-efficiency. Users can now initiate a model deployment using a single "CREATE MODEL" statement, specifying a Hugging Face model ID such as "sentence-transformers/all-MiniLM-L6-v2." According to Google Cloud, the platform typically completes deployment within 3 to 10 minutes. Once active, inference is performed through standard functions like "AI.GENERATE_TEXT" or "AI.GENERATE_EMBEDDING," allowing analysts to process millions of rows of data without leaving the SQL environment. To manage costs, BigQuery includes an "endpoint_idle_ttl" option that automatically shuts down resources when not in use, a critical feature for batch processing tasks where idle GPU time can lead to significant overhead.
This development addresses a long-standing bottleneck in the data science pipeline: the "operational gap" between data storage and model execution. Historically, running open-source models required data engineers to move massive datasets out of the warehouse into separate ML environments, incurring egress costs and increasing latency. By bringing the models to the data, Google is targeting the 13,000+ text embedding models and 170,000+ text generation models available on Hugging Face, including Meta’s Llama series and Google’s own Gemma family. This integration is particularly timely as U.S. President Trump’s administration continues to emphasize American leadership in AI infrastructure and domestic technological self-reliance, fostering an environment where rapid enterprise adoption of AI is viewed as a competitive necessity.
From a market perspective, Google’s move is a direct response to the evolving "Data + AI" landscape dominated by the trio of Google, Snowflake, and Databricks. While Snowflake’s Cortex AI and Databricks’ Model Serving offer similar SQL-accessible inference, Google’s advantage lies in its deep integration with the Hugging Face ecosystem and its native Vertex AI infrastructure. For instance, a September 2025 benchmark showed that processing 38 million rows for embeddings could cost as little as $2 to $3 using these optimized patterns. This aggressive pricing and ease of use are designed to lower the barrier to entry for mid-market firms that lack the specialized DevOps talent to manage raw ML clusters.
Looking ahead, the trend toward "SQL-native AI" suggests that the role of the data analyst is merging with that of the ML engineer. As managed inference becomes a standard feature of cloud data warehouses, the focus will shift from infrastructure management to prompt engineering and model selection. We expect Google to expand this support to multi-modal models, including image and audio processing, within the next twelve months. Furthermore, as enterprises increasingly prioritize data sovereignty and security, the ability to run open-source models within the governed perimeter of BigQuery provides a compelling alternative to sending sensitive data to external LLM providers via public APIs. This shift not only optimizes performance but also aligns with emerging regulatory frameworks regarding data privacy and AI transparency.
Explore more exclusive insights at nextfin.ai.