TransCity: Next-Generation Smart City Foundation Model

Understanding cities through physical-world dynamics.

TransCity is a multimodal vision–language foundation model designed to learn cities directly from their spatio-temporal dynamics. By integrating urban imagery, maps, traffic, infrastructure signals, weather, and external reports, TransCity enables predictive, interpretable, and evidence-grounded reasoning for real-world city decision-making. From mobility and logistics to energy and resilience, TransCity transforms urban data into actionable city intelligence.

Build with TransCity

We've designed TransCity for real-world deployment, scalable performance, and trustworthy urban intelligence. We look forward to seeing how you build the future of cities.

Get TransCity Model →

Model Architecture

MoMExp Nets compress high-entropy multimodal data into structured representations and project them into the TransCity core model. The TAME framework within the core model supports seamless switching between instant and deliberative ("thinking") modes. An agent system augments user queries through real-time retrieval from databases and web sources.

Learn more

Data to Reasoning Traceability

Agent-based retrieval-augmented generation gathers multimodal urban evidence on demand. Reasoning chains are constructed over retrieved data, enabling explicit multi-modal analysis. Each conclusion remains traceable to its underlying data sources.

Watch Video

Model Interpretability

MoMExp attention patterns provide interpretable insights into modality and spatial relevance. The TAME mixture-of-experts architecture exposes expert routing and decision pathways. Model reasoning, evidence usage, and answer confidence can be systematically analyzed.

Learn more

Technology Book

A complete technical reference covering TransCity's four-stage training pipeline. Includes implementation details for the foundation model, MoMExp Nets, and agent system. Designed for reproducibility, extension, and real-world deployment.

Learn more

Multimodal Cities Data

Smart city queries are defined by an analytical objective and its surrounding context.

Each question is represented as Query + Precise Data (PreD) + High-Entropy Data (HieD), and Visual Data, where PreD directly corresponds to the objective and HieD provides complementary urban context without explicit alignment.

Multimodal Cities Data
TransCity Architecture

TransCity Architecture

Query and Precise Data (PreD) are directly injected into the core model.

High-Entropy Data (HieD) are processed by modality-specific MoMExp Nets and projected into the model through learned projectors.

Visual inputs are encoded by an image encoder and integrated via a projection layer, enabling unified multimodal reasoning.

Task-Aware Mixture of Experts (TAME)

TransCity employs a mixture-of-experts architecture with a dynamic router that selects different expert subsets for instant responses and deliberative reasoning.

A shared expert pathway ensures consistency across modes, while attention-based aggregation integrates expert outputs into the core model.

TAME Architecture
AI Agent System

AI Agent System

An agent planner decomposes the user query into a retrieval workflow. Functional agents collect multimodal urban data, which are subsequently verified, refined, and aggregated.

The system iterates when necessary and produces a validated multimodal data package for downstream reasoning.

Global Training Data

Data Sources and Coverage

TransCity is trained on large-scale urban datasets collected across Europe, North America, Asia, and Oceania, integrating heterogeneous urban signals such as traffic and mobility, electricity and energy systems, points of interest, demographics, weather, urban events, web reports, and remote sensing imagery.

The corpus spans a 14-year temporal horizon from 2010 to 2024 and covers more than 21 million city-days of multimodal urban observations.

After tokenization, the complete corpus amounts to approximately 16 billion tokens.

The four-stage training pipeline is completed in approximately 22 days using 16 NVIDIA H20 GPUs with 96 GB memory each.

Learn More →
Global Training Data Coverage
New Zealand Map

New Zealand National Multi-Source Traffic Dataset

This dataset covers nationwide highway traffic in New Zealand from January 2013 to January 2022, spanning over 9 years at a 15-minute temporal resolution. It includes traffic flow records from 2,042 sensors, integrated with 55 weather stations, and provides vehicle-type–specific (light/heavy) and direction-aware measurements with rich sensor metadata and event annotations.

Learn More →
Singapore Map

Singapore Bus Ridership Dataset

This study utilizes a Singapore urban dataset centered on bus ridership records from 5,172 bus stops, capturing passenger boarding and alighting volumes across the city. The transit data are enriched with land-use features, including surrounding points of interest (POIs), residential HDB building attributes, and public transport connectivity information from bus and metro systems, providing a comprehensive characterization of local mobility demand and urban functional context.

Learn More →
Europe Map

ENTSO-E European Electricity Bidding Zone Dataset

This study compiles a large-scale electricity market dataset from the ENTSO-E Transparency Platform, covering approximately 50+ European bidding zones that represent the fundamental spatial units of the European power market. The dataset spans 2014–2024, providing ten years of continuous historical records across multiple countries and sub-national zones, and captures long-term spatio-temporal dynamics of electricity systems at the bidding-zone level.

Learn More →

TransCity Capabilities

TransCity connects data, people, and cities to turn complex urban information into clear, actionable insight—supporting smarter decisions across mobility, energy, and everyday city life.

Start building with TransCity →

Natively Multimodal

TransCity is built as a natively multimodal foundation model for cities.

It jointly processes text, time-series, spatial graphs, and visual data through early fusion, enabling unified spatio-temporal reasoning rather than isolated modality alignment.

Agent-Driven Retrieve Plan

TransCity employs an agent system to analyze user queries and translate them into explicit spatio-temporal constraints.

Based on these constraints, agents generate structured retrieval plans and assemble long-context multimodal evidence from databases and web sources.

Data-to-Reasoning Traceability

TransCity reasons over agent-retrieved multimodal data in a step-by-step manner.

Each reasoning step is explicitly grounded in the corresponding evidence, enabling traceable and auditable data-to-decision workflows.

Answers with Confidence

TransCity produces answers accompanied by confidence-aware analysis.

By tracking evidence usage, expert routing, and reasoning depth, the system provides users with more reliable and trustworthy responses.

Resources

Explore the latest tools, documentation, and best practices as you build with TransCity.

Open-Source Model

TransCity is released as an open-source foundation model to support sustainable smart city services for governments, enterprises, non-profit organizations, and individual users.

HuggingFace

Modular Codebase

The codebase is organized into distinct components, including four-stage training pipelines, an agent system, and local deployment tutorials.

Technology Book

Data Sources

TransCity is trained on large-scale multimodal smart city data, including question-answer pairs, reasoning chains, and real-world urban observations spanning 14 years across multiple continents.

Learn more