TransCity: Next-Generation Smart City Foundation Model
Understanding cities through physical-world dynamics.
TransCity is a multimodal vision–language foundation model designed to learn cities directly from their spatio-temporal dynamics. By integrating urban imagery, maps, traffic, infrastructure signals, weather, and external reports, TransCity enables predictive, interpretable, and evidence-grounded reasoning for real-world city decision-making. From mobility and logistics to energy and resilience, TransCity transforms urban data into actionable city intelligence.
Build with TransCity
We've designed TransCity for real-world deployment, scalable performance, and trustworthy urban intelligence. We look forward to seeing how you build the future of cities.
Get TransCity Model →Model Architecture
MoMExp Nets compress high-entropy multimodal data into structured representations and project them into the TransCity core model. The TAME framework within the core model supports seamless switching between instant and deliberative ("thinking") modes. An agent system augments user queries through real-time retrieval from databases and web sources.
Data to Reasoning Traceability
Agent-based retrieval-augmented generation gathers multimodal urban evidence on demand. Reasoning chains are constructed over retrieved data, enabling explicit multi-modal analysis. Each conclusion remains traceable to its underlying data sources.
Model Interpretability
MoMExp attention patterns provide interpretable insights into modality and spatial relevance. The TAME mixture-of-experts architecture exposes expert routing and decision pathways. Model reasoning, evidence usage, and answer confidence can be systematically analyzed.
Technology Book
A complete technical reference covering TransCity's four-stage training pipeline. Includes implementation details for the foundation model, MoMExp Nets, and agent system. Designed for reproducibility, extension, and real-world deployment.
Multimodal Cities Data
Smart city queries are defined by an analytical objective and its surrounding context.
Each question is represented as Query + Precise Data (PreD) + High-Entropy Data (HieD), and Visual Data, where PreD directly corresponds to the objective and HieD provides complementary urban context without explicit alignment.
TransCity Architecture
Query and Precise Data (PreD) are directly injected into the core model.
High-Entropy Data (HieD) are processed by modality-specific MoMExp Nets and projected into the model through learned projectors.
Visual inputs are encoded by an image encoder and integrated via a projection layer, enabling unified multimodal reasoning.
Task-Aware Mixture of Experts (TAME)
TransCity employs a mixture-of-experts architecture with a dynamic router that selects different expert subsets for instant responses and deliberative reasoning.
A shared expert pathway ensures consistency across modes, while attention-based aggregation integrates expert outputs into the core model.
AI Agent System
An agent planner decomposes the user query into a retrieval workflow. Functional agents collect multimodal urban data, which are subsequently verified, refined, and aggregated.
The system iterates when necessary and produces a validated multimodal data package for downstream reasoning.
Global Training Data
Data Sources and Coverage
TransCity is trained on large-scale urban datasets collected across Europe, North America, Asia, and Oceania, integrating heterogeneous urban signals such as traffic and mobility, electricity and energy systems, points of interest, demographics, weather, urban events, web reports, and remote sensing imagery.
The corpus spans a 14-year temporal horizon from 2010 to 2024 and covers more than 21 million city-days of multimodal urban observations.
After tokenization, the complete corpus amounts to approximately 16 billion tokens.
The four-stage training pipeline is completed in approximately 22 days using 16 NVIDIA H20 GPUs with 96 GB memory each.
Learn More →New Zealand National Multi-Source Traffic Dataset
This dataset covers nationwide highway traffic in New Zealand from January 2013 to January 2022, spanning over 9 years at a 15-minute temporal resolution. It includes traffic flow records from 2,042 sensors, integrated with 55 weather stations, and provides vehicle-type–specific (light/heavy) and direction-aware measurements with rich sensor metadata and event annotations.
Learn More →Singapore Bus Ridership Dataset
This study utilizes a Singapore urban dataset centered on bus ridership records from 5,172 bus stops, capturing passenger boarding and alighting volumes across the city. The transit data are enriched with land-use features, including surrounding points of interest (POIs), residential HDB building attributes, and public transport connectivity information from bus and metro systems, providing a comprehensive characterization of local mobility demand and urban functional context.
Learn More →ENTSO-E European Electricity Bidding Zone Dataset
This study compiles a large-scale electricity market dataset from the ENTSO-E Transparency Platform, covering approximately 50+ European bidding zones that represent the fundamental spatial units of the European power market. The dataset spans 2014–2024, providing ten years of continuous historical records across multiple countries and sub-national zones, and captures long-term spatio-temporal dynamics of electricity systems at the bidding-zone level.
Learn More →TransCity Capabilities
TransCity connects data, people, and cities to turn complex urban information into clear, actionable insight—supporting smarter decisions across mobility, energy, and everyday city life.
Start building with TransCity →Natively Multimodal
TransCity is built as a natively multimodal foundation model for cities.
It jointly processes text, time-series, spatial graphs, and visual data through early fusion, enabling unified spatio-temporal reasoning rather than isolated modality alignment.
Agent-Driven Retrieve Plan
TransCity employs an agent system to analyze user queries and translate them into explicit spatio-temporal constraints.
Based on these constraints, agents generate structured retrieval plans and assemble long-context multimodal evidence from databases and web sources.
Data-to-Reasoning Traceability
TransCity reasons over agent-retrieved multimodal data in a step-by-step manner.
Each reasoning step is explicitly grounded in the corresponding evidence, enabling traceable and auditable data-to-decision workflows.
Answers with Confidence
TransCity produces answers accompanied by confidence-aware analysis.
By tracking evidence usage, expert routing, and reasoning depth, the system provides users with more reliable and trustworthy responses.
Resources
Explore the latest tools, documentation, and best practices as you build with TransCity.
Open-Source Model
TransCity is released as an open-source foundation model to support sustainable smart city services for governments, enterprises, non-profit organizations, and individual users.
HuggingFaceModular Codebase
The codebase is organized into distinct components, including four-stage training pipelines, an agent system, and local deployment tutorials.
Technology BookData Sources
TransCity is trained on large-scale multimodal smart city data, including question-answer pairs, reasoning chains, and real-world urban observations spanning 14 years across multiple continents.
Learn more