Google Beam Goes Multi-Person: How the 3D Group Call Experiment Works

Google I/O 2026 extended Beam to multi-person calls. Here's the AI pipeline, the $24,999 display, and where the gaps are.

Creeta

May 30, 2026

Google Beam Goes Multi-Person: How the 3D Group Call Experiment Works

From 1:1 to Many: What the Google Beam Group Experiment Adds

Google Beam is a 3D video communication platform that renders remote participants life-size on a light-field display, using AI-reconstructed depth to simulate physical co-presence across distance. Prior to May 2026, Beam was architected strictly as a bilateral system — a 1:1 format requiring a dedicated Beam device at both endpoints. The group meeting experiment announced at Google I/O on May 20, 2026 breaks that constraint: multiple remote participants joining from standard Google Meet or Zoom cameras can now appear life-size and spatially arranged around a virtual table on a single Beam endpoint, without any Beam hardware on their side. The capability is labeled explicitly as an experiment and is gated to a named list of enterprise partners — not a general product release.

Quick Answer: Google Beam's group meeting experiment, announced May 20, 2026 at Google I/O, lets multiple remote participants on Meet or Zoom appear life-size in 3D on a single HP Dimension endpoint — no Beam hardware required on the remote side. The experiment is limited to named enterprise partners (Deloitte, Salesforce, Citadel, NEC, Duolingo) with no GA date committed.

The architectural shift has real implications. The previous design required a Beam endpoint at both ends of a call — a constraint that limited deployments to scenarios where both parties had invested in dedicated hardware. The group experiment inverts that requirement: one Beam device serves as the receiving display, while multiple remote participants contribute standard 2D video from ordinary laptops or conference room cameras. According to the Google Research Blog announcement, AI models running server-side reconstruct spatial arrangement and map each participant to a position around a virtual table on the light-field display.

This is not a finished feature. The announcement uses the word "experiment" consistently throughout — language that signals active development and a higher tolerance for limitations than a production rollout would carry. Google has committed to wider trials "later this year" with no specific dates, pricing tiers, or support terms published . For developers and IT architects evaluating collaboration infrastructure, the experiment-versus-product distinction is material: compute architecture, API surface, and certification requirements for the group rendering stack are not publicly documented.

The HP Dimension: 65-Inch Light Field Display, Six Cameras, $24,999

The HP Dimension is the only commercially available Beam endpoint hardware as of May 2026. It is a 65-inch light-field display equipped with six integrated cameras, listed at $24,999 per installation . That figure covers hardware only — the Beam software license carries an additional, publicly undisclosed cost. The device requires a dedicated room with managed lighting. It is not compatible with a shared or multi-use conference room setup, and Google has not published certified room design guidelines. The entry cost, factored alongside facilities commitment, positions this as a capital infrastructure investment rather than a software procurement.

The six cameras capture the local viewer's position and gaze direction continuously, feeding head-tracking data into the depth rendering pipeline at 60 frames per second . This drives real-time adjustment of each remote participant's rendered depth as the in-room viewer shifts position, preserving perspective-correct parallax and gaze direction cues. Six simultaneous video streams from remote participants are merged by AI in real time, producing a single spatial scene rather than a tiled grid.

Specification	Detail	Notes
Display size	65 inches (light-field)	Life-size rendering of remote participants
Integrated cameras	6	Multi-angle capture for head tracking and depth continuity
Hardware list price	$24,999 per installation	Software license additional; cost undisclosed
Rendering frame rate	60 fps	Real-time AI stream merging with head-tracking
Room requirement	Dedicated room, managed lighting	Incompatible with shared or ad-hoc conference rooms
Remote participant hardware	Standard laptop or room camera	No Beam endpoint required on the remote side
Manufacturer	HP (sold as HP Dimension)	Beam software licensed separately from Google

The room constraint deserves attention in any deployment evaluation. A dedicated room with managed lighting is a meaningful physical and operational commitment: it cannot double as overflow meeting space, a hot-desk area, or a standard video-call room. For organizations with constrained real estate or flexible workspace policies, this limits the effective deployment footprint and adds facilities cost to any TCO model.

According to VoIP Review's coverage of the I/O announcement, further hardware and integration developments may be announced at InfoComm 2026 . That creates a procurement timing risk: early buyers of the current HP Dimension hardware may face a near-term refresh cycle with no published upgrade path or trade-in terms.

How Google Beam Reconstructs Spatial Position from Ordinary Cameras

Google Beam infers the 3D spatial arrangement of remote participants from standard 2D video feeds — no depth sensors, lidar, or specialized remote hardware required. The system uses machine learning models trained on meeting room layouts to estimate each participant's position relative to others in the same remote room and, by extension, relative to the Beam-side viewer. Those inferred positions map to a virtual table rendered on the light-field display, realizing what Google describes as "perceptual equality" — the design principle that remote attendees should carry the same perceived presence and social weight as in-room participants .

Writing in the Google Research Blog, Mohamed Abdelgany describes the spatial reconstruction approach: ML models trained to understand meeting room layouts reconstruct each participant's position from 2D video, enabling a virtual table experience where everyone appears at natural scale and depth — what the team calls "perceptual equality."

The reconstruction pipeline runs server-side, meaning the compute burden is not imposed on remote participants' hardware or network. A standard laptop camera capturing 720p or 1080p video is sufficient input. The AI models solve a non-trivial estimation problem: from a single 2D viewpoint, infer not just the approximate 3D position of each participant but also their orientation and relative spacing from others in the same frame. That inferred geometry is encoded into a form the light-field display can render as a spatially positioned participant on the Beam side.

Local head-tracking on the Beam side continuously updates depth rendering as the in-room viewer moves. When the viewer shifts left or right, the rendered positions of remote participants shift in perspective-correct ways — the same parallax you'd expect from physically moving in a real meeting room. This is where the six-camera array is essential: accurate head position data drives the real-time depth adjustment. The system maps where the local viewer's gaze falls on the display and adjusts rendered eye direction for the corresponding remote participant accordingly.

One limitation worth flagging for evaluators: the spatial reconstruction is an estimation, not a measurement. Partial occlusion, atypical camera angles, or unusual room configurations can degrade position inference accuracy. Google has not published the model architecture, training data composition, or accuracy benchmarks for the spatial reconstruction component. Until that documentation is available, "perceptual equality" is an aspirational design target rather than a measured perceptual outcome.

Full 3D on Meet and Zoom, Flat Rendering on Teams and Webex

Remote participants who join via Google Meet or Zoom receive full life-size 3D spatial rendering on the Beam display. Participants who join via Microsoft Teams or Cisco Webex can participate in the call but appear only in 2D — no spatial rendering, no virtual table positioning . Google has not announced a completion date for Teams or Webex 3D integration. As of May 2026, the platform-level work for those services is incomplete with no timeline committed.

The commercial implications are significant. Microsoft Teams and Cisco Webex dominate large enterprise deployments — the same enterprise tier that constitutes the primary buyer for a $24,999+ room system. An organization where a portion of participants use Teams and others use Meet would see mixed rendering on the Beam display: some participants life-size and spatially positioned, others in a flat 2D tile. That inconsistency directly undermines the "perceptual equality" thesis Beam is built around.

Platform	Beam Group Rendering (May 2026)	Integration Status	Commercial Note
Google Meet	Full life-size 3D spatial rendering	Supported	Native integration; full feature set available
Zoom	Full life-size 3D spatial rendering	Supported	Large market share; strengthens Beam's deployment case
Microsoft Teams	2D only — no spatial rendering	Incomplete; no date announced	Dominant in enterprise; gap is commercially material
Cisco Webex	2D only — no spatial rendering	Incomplete; no date announced	Significant in regulated industries; same limitation applies

From an integration engineering perspective, the gap likely reflects differences in how each platform surfaces its video stream and participant metadata APIs. Meeting platform APIs vary significantly in what they expose for third-party consumption — frame-level access, participant roster data, and spatial metadata are not uniformly available. Google's ability to deliver full 3D for Meet and Zoom but not for Teams or Webex implies those platforms have not yet provided, or agreed to provide, the integration surface the spatial reconstruction pipeline requires. Whether this is a technical, commercial, or contractual constraint is not stated in any public announcement.

Enterprises evaluating Beam need to map their existing UC platform mix before procurement. A Teams-dominated organization would currently get partial functionality from a Beam deployment. IT architects building the integration layer should also plan for future platform parity updates that may require software upgrades, recertification, or renegotiated licensing terms when — and if — Teams and Webex 3D support arrives.

Spatial Audio and Real-Time Speech Translation

Google Beam's perceptual layer extends beyond visual rendering to include spatial audio: each speaker's voice is anchored to their virtual position on the display . When a participant is positioned to the left on the virtual table, their voice arrives from the left. When the local viewer shifts head position and the depth rendering updates, the audio field shifts in coordination — maintaining perceptual coherence between visual location and sound source direction. This positions-locked audio design reinforces the spatial illusion that makes the light-field display meaningful rather than purely visual.

The head-tracking system drives both depth rendering and audio spatialization simultaneously. That coupling matters: if audio and visual cues fall out of sync — a participant appearing at one screen position while their voice arrives from another direction — the spatial coherence breaks and the sense of presence degrades. The 60fps rendering pipeline must keep both modalities aligned within perceptual thresholds , which places latency requirements on the audio processing path that differ from ordinary conferencing audio pipelines. This is a non-trivial engineering constraint, especially as participant count in group meetings scales up.

Beam also includes an AI-powered Speech Translation feature enabling real-time language translation during calls without a separate tool or interpreter workflow. The translated output is, by implication, spatialized in the same way as the original — arriving from the direction of the speaker on the display. Google has not published supported language pairs, translation latency figures, or accuracy benchmarks for this component. For enterprises operating across language borders, the feature reduces workflow friction in multi-language calls, but evaluation against specific language pairs and domain vocabulary requires hands-on testing.

Together, spatial audio and speech translation form the perceptual infrastructure that distinguishes Beam from a display technology. The visual rendering addresses where people appear; the audio layer addresses how their presence is felt; the translation layer removes language as a barrier. Whether these components perform reliably under group meeting conditions — overlapping speech, acoustic variability across multiple remote rooms, and translation latency under load — is precisely what the experiment phase should be measuring.

Google cites two headline metrics from internal user research: a 50% increase in perceived social connection and a 21% increase in active participation among Beam users compared to standard grid video calls . These figures appear in official announcements and have been cited in trade press coverage of the I/O reveal. Neither metric has been independently reproduced or published in a peer-reviewed venue as of May 2026.

Per VoIP Review's coverage of the Beam announcement, Google's internal research reports that Beam users feel 50% more socially connected and are 21% more likely to actively participate compared to standard video calls . The study design, participant population, and comparison baseline have not been published.

These metrics align with the theoretical underpinning of Beam: if spatial presence activates the same social cognition as physical co-location, behavioral outcomes like participation rates and feelings of connection should improve. The "perceptual equality" design thesis predicts exactly this kind of effect. But the path from a design thesis to a validated causal claim runs through independent replication, controlled study conditions, and domain-specific testing — none of which has been published. The study population, task type, comparison condition (what specific grid video setup served as the baseline?), and measurement instruments are not documented in any public material.

For practitioners building internal ROI cases for Beam deployment, these numbers offer directional support but should not be cited as externally validated benchmarks. An organization making a $24,999+ per-endpoint investment would benefit from running its own pilot with its own participation and satisfaction instrumentation rather than relying on vendor-sourced metrics from undisclosed study conditions.

The direction of the effect is plausible and consistent with existing HCI research on spatial presence. The question is not whether better spatial presence improves engagement — it very likely does — but whether the magnitude of improvement at this technology's current fidelity and constraints justifies the specific cost structure. The 50% and 21% figures establish that a measurable effect exists; they are not sufficient for sizing that effect in a particular enterprise context.

Deloitte, Salesforce, the USO: Named Partners and Gaps Ahead

Google confirmed five named enterprise early adopters for the Beam group meeting experiment: Deloitte, Salesforce, Citadel, NEC, and Duolingo . All are operating under private beta terms. The selection spans professional services (Deloitte), enterprise SaaS (Salesforce), finance (Citadel), technology manufacturing (NEC), and consumer software (Duolingo) — a range that suggests Google is stress-testing Beam across distinct use cases and meeting dynamics rather than optimizing for a single vertical. No case studies, usage data, or feedback from any named partner has been publicly published.

A separate partnership with the United Service Organizations (USO) deploys Beam at service centers in the US and internationally for deployed military personnel, starting in 2026 . As reported by Engadget, this extends Beam's scope beyond enterprise productivity into high-stakes personal communication — circumstances where presence quality has emotional rather than economic weight. The USO deployment is almost certainly subsidized or donated, and should not be read as a commercial pricing signal, but it provides a high-visibility validation context for the underlying technology.

What is conspicuously absent from Google's announcements: a general availability roadmap with firm dates, a published volume pricing structure, managed IT support terms, certified room design guidelines, or an upgrade path for customers who buy hardware now. The commitment to wider trials "later this year" is a directional signal, not a delivery commitment. Organizations with formal procurement cycles, facilities planning requirements, and IT governance processes have no actionable timeline from the current disclosure.

For developers and technical evaluators, the absence of public APIs, integration specifications, or a developer preview program is a practical gap. Beam is positioned as an enterprise product, but the standard enterprise evaluation workflow — proof of concept, IT security review, integration testing, staged rollout — cannot begin without published technical documentation. That documentation does not yet exist in the public domain.

Frequently Asked Questions

Can Microsoft Teams users join a Google Beam group call in full 3D?

No. Microsoft Teams participants can join a Google Beam group meeting, but they appear only in 2D on the Beam display — no life-size spatial rendering is applied. Full 3D rendering currently requires participants to join via Google Meet or Zoom. Cisco Webex carries the same limitation as of May 2026 . Google has not announced a timeline for completing Teams or Webex 3D integration, and no scheduled completion date exists in any public announcement.

Do remote participants need special equipment to appear on a Beam display?

No. Remote participants join via a standard laptop or conference room camera on Google Meet or Zoom — no Beam hardware, depth sensors, or special client software is required on their end. Only the in-room endpoint requires the HP Dimension device. The AI spatial reconstruction pipeline runs entirely server-side; the remote participant contributes standard 2D video that the Beam infrastructure processes into a spatially positioned 3D representation rendered on the local display .

How does Beam infer 3D spatial position when remote cameras capture only flat 2D?

ML models trained on meeting room layouts estimate each participant's position from 2D video frames. The models infer approximate 3D placement — including relative spacing between participants in the same remote room — and map those positions to a virtual table on the Beam light-field display. Head-tracking on the local Beam side then adjusts depth rendering continuously as the in-room viewer moves, maintaining perspective-correct parallax. Google has not published the model architecture, training data composition, or accuracy benchmarks for this spatial reconstruction component .

What does the HP Dimension device cost, and what room setup does it require?

The HP Dimension is listed at $24,999 per installation for hardware . The Beam software license is an additional cost that Google has not publicly disclosed. The device requires a dedicated room with managed lighting and is incompatible with shared or multi-use conference rooms. Google has not published certified room design specifications or detailed facilities requirements beyond this high-level constraint. Total cost of ownership must account for both the undisclosed software licensing and the facilities commitment.

When will Google Beam group meetings be broadly available?

No general availability date has been announced. As of May 2026, the group meeting experiment is limited to five named enterprise partners: Deloitte, Salesforce, Citadel, NEC, and Duolingo, all under private beta terms . Google committed to wider trials "later this year" with no specific milestone dates, pricing tiers, or service-level commitments attached. Further announcements are anticipated at InfoComm 2026 .

What This Experiment Tells You — and What It Doesn't

The Google Beam group meeting experiment is a technically meaningful capability expansion built on a narrow hardware and platform foundation. The core achievement — inferring 3D spatial arrangement from ordinary 2D cameras and rendering it life-size on a light-field display — is a substantive step beyond the prior bilateral-only architecture. Spatial audio coupling, real-time speech translation, and continuous head-tracking combine into a coherent perceptual model. For the specific scenario where a single Beam endpoint hosts a meeting with multiple Meet or Zoom participants, the system delivers something qualitatively distinct from a standard video grid.

The gaps are as significant as the capabilities. Teams and Webex are unsupported for 3D rendering — a structural problem for any enterprise where those platforms are the standard. The hardware commitment ($24,999 plus undisclosed software licensing, dedicated room, managed lighting) makes this a facilities investment, not a software decision. The experiment label is accurate: multi-person 3D spatial reconstruction from heterogeneous 2D sources is not a production-grade system yet, and no production timeline has been committed. The internal research metrics showing a 50% lift in social connection and 21% in participation are directional signals , not independently verified benchmarks suitable for enterprise ROI justification.

For developers and technical architects, the practical near-term action is to monitor the InfoComm 2026 announcements and wait for published integration documentation — APIs, SLAs, room design specs, and platform parity timelines. The technology direction is credible, but the platform is not yet in a state that supports broad evaluation without more disclosure. If your organization is on the named partner list, you have access to information that is not publicly available. If not, the experiment phase has not yet produced enough signal to drive procurement or infrastructure planning decisions.

Last updated: 2026-05-30. Based on announcements made at Google I/O 2026 on May 20, 2026 and subsequent trade press coverage through May 28, 2026. Platform capabilities, partner status, and pricing may change as the experiment phase expands.

Google Beam Goes Multi-Person: How the 3D Group Call Experiment Works

From 1:1 to Many: What the Google Beam Group Experiment Adds

The HP Dimension: 65-Inch Light Field Display, Six Cameras, $24,999

How Google Beam Reconstructs Spatial Position from Ordinary Cameras

Full 3D on Meet and Zoom, Flat Rendering on Teams and Webex

Spatial Audio and Real-Time Speech Translation

Deloitte, Salesforce, the USO: Named Partners and Gaps Ahead

Frequently Asked Questions

Can Microsoft Teams users join a Google Beam group call in full 3D?

Do remote participants need special equipment to appear on a Beam display?

How does Beam infer 3D spatial position when remote cameras capture only flat 2D?

What does the HP Dimension device cost, and what room setup does it require?

When will Google Beam group meetings be broadly available?

What This Experiment Tells You — and What It Doesn't

Featured posts

SuperGrok and Kilo Code: Setup Across Tiers and Environments 2026

xAI Grok in Kilo Code 2026: A Developer's Model Comparison

Anthropic SDK 0.105.1 and 0.105.2: PyPI Trusted Publishing Hotfix

Gemini for Science at I/O 2026: How Each Research Tool Works

SuperGrok Subscription Now Unlocks grok-build-0.1 in Kilo Code

Microsoft Copilot Cowork: 프롬프트 인젝션으로 M365 파일 유출

Project Genie + Street View: 현실 세계 시뮬레이션, Genie 3에 탑재

SuperGrok과 Kilo Code: 2026년 티어별·환경별 설정 완전 가이드

Kilo Code에서 xAI Grok 비교 (2026): 개발자를 위한 모델 가이드

I/O 2026의 Gemini for Science: 각 연구 도구는 어떻게 작동하나

Tags

Google Beam Goes Multi-Person: How the 3D Group Call Experiment Works

From 1:1 to Many: What the Google Beam Group Experiment Adds

The HP Dimension: 65-Inch Light Field Display, Six Cameras, $24,999

How Google Beam Reconstructs Spatial Position from Ordinary Cameras

Full 3D on Meet and Zoom, Flat Rendering on Teams and Webex

Spatial Audio and Real-Time Speech Translation

Google's Research Numbers: 50% Stronger Social Connection, 21% More Participation

Deloitte, Salesforce, the USO: Named Partners and Gaps Ahead

Frequently Asked Questions

Can Microsoft Teams users join a Google Beam group call in full 3D?

Do remote participants need special equipment to appear on a Beam display?

How does Beam infer 3D spatial position when remote cameras capture only flat 2D?

What does the HP Dimension device cost, and what room setup does it require?

When will Google Beam group meetings be broadly available?

What This Experiment Tells You — and What It Doesn't

Featured posts

Tags

Sign up for insights and ideas