Geospatial embeddings

A promising new tool, with some trade-offs

Mar 26, 2026

New spatial data workflows typically face the same demands: acquire raw input data, compute derived variables, and build an analysis pipeline. Each new region or use case requires domain experts to repeat most of this work from scratch.

Geospatial embeddings offer a promising shortcut. By compressing large volumes of raw spatial data into compact numerical representations, geospatial embeddings can simplify analysis workflows and improve efficiency at scale. But this is a young field, and there are trade-offs to choosing embeddings over traditional spatial datasets.

Public embedding datasets are now available from Google DeepMind (AlphaEarth Foundations), Clay, and the University of Cambridge (TESSERA), among others. At Cecil, we’re exploring how these datasets can support the teams we work with. This newsletter shares what we’ve learned so far.

Fingerprints for a place

A geospatial embedding is a set of numbers that summarises the characteristics of a location, as observed by multiple input datasets over a given period. Each embedding is a point in multi-dimensional space, where the closer two points are, the more similar the locations they represent. In this way, an embedding acts as a fingerprint of a place.

Embeddings are like condensing a landscape onto a post-it note. They capture essential structure and discard redundant information.

Imagine sitting back-to-back with a partner. In front of you is a landscape painting. Your task is to get your partner to recreate it, but you can only pass them a single post-it note. To do this, you must compress the essential structure of the painting into a few words: mountain ridge, river bend, seasonal snow line, and so on. That post-it note is like an embedding. It preserves the information most important for distinguishing this landscape from others, while discarding redundant details.

Geospatial embeddings are generated by geospatial foundation models, which use self-supervised learning (i.e. unsupervised by a human) to compress many spatial datasets into a vector of fewer dimensions. Input datasets typically include multispectral imagery, radar backscatter, elevation models, and climate variables.

Different models use different architectures, training strategies, and input datasets, resulting in embedding datasets with a variety of properties and dimensions. These design choices also affect which workflows each model is suited to. For instance, AlphaEarth Foundations and TESSERA produce one embedding per pixel, making them suited for pixel-level classification and fine-grained mapping. Most other models embed image patches, making them better suited to scene classification and monitoring large areas.

The table below summarises some key distinctions between models available today.

Unlike a single satellite sensor band, which corresponds to a fixed property like near-infrared reflectance, no single embedding dimension has a fixed meaning. The representation is distributed, so the “meaning” only exists in the relationships between all dimensions together. You cannot use a single dimension to identify a forest. Instead, you must use the full vector and apply analytical techniques to derive this answer.

Where embeddings add practical value

Geospatial embeddings address some common challenges facing data teams today (see our newsletters on spatial data foundations and spatial indexes):

Similarity search. Consider a known reference location, like a mangrove stand or deforestation frontier. By comparing its embedding with those of other locations, it is possible to identify sites with similar properties in embedding space.

Clustering. Without any labelled data, embedding vectors can be clustered to produce first-pass maps of landscape types. Clusters typically correspond with meaningful land cover categories, making this valuable for stratified survey design or initial site baselining.

Change detection. Comparing embeddings of the same location over time can reveal change, for instance by calculating similarity between two years to create a change score for each pixel. This is an effective triage tool for identifying where to focus further analysis.

Label efficiency. This is perhaps the most significant advantage for nature data teams, given that ground-truth data is expensive to collect. For instance, AlphaEarth Foundations reduced the number of samples needed to classify 87 crop categories from thousands per class to approximately 150 per class. The Element84 team also found that a simple logistic regression trained on AlphaEarth Foundations’ embeddings could detect bracken (an aggressive fern that degrades the UK uplands) from a small number of training samples. Embeddings significantly lower the barrier to entry for organisations that can’t justify tens of thousands of dollars on field data collection.

Trade-offs with geospatial embeddings

Embeddings are not a like-for-like replacement for existing spatial datasets or methods. Similar to our recommendations for dataset evaluation, it is important to be clear-eyed about the trade-offs:

Interpretability. When a statistical model trained on embeddings labels a pixel as “degraded peatland,” you cannot trace that decision back to specific source data. Traditional spectral indices, like the Normalized Digital Vegetation Index (NDVI), have a direct biophysical interpretation, whereas embeddings do not. This is a constraint for applications that require explainability, such as regulatory reporting or auditing.

Temporal resolution. Most public embedding datasets are delivered at annual temporal resolution. This works for tracking slow changes like urban expansion, but seasonal changes like winter to summer crop rotations are compressed away. Unlike traditional annual composites, there is no way to inspect which seasonal signals the model retained. This also affects what “similar” means in embedding space. Models encoding a full year may place two sites close together because they share seasonal dynamics, not because they look alike on any one date. For higher frequency use cases like deforestation, flood, or fire alerts, raw imagery or purpose-built datasets are still required.

Geographic bias. All foundation models carry the biases of their input data. For instance, Prithvi v1.0 was pre-trained exclusively on US data, while SatCLIP uses only 100,000 scenes globally. It is essential to validate model performance in your region of interest before relying on analysis results.

Platform dependency. AlphaEarth Foundations embeddings are available through Google Earth Engine, Google Cloud Storage, and Source Cooperative (thanks to Taylor Geospatial Engine and Radiant Earth, with Jeff Albrecht leading the migration plus the CNG community). Alternatives like Clay, TESSERA, and Prithvi are earlier in development and offer more flexibility. Cost is also a factor. The full AlphaEarth Foundations dataset is hundreds of terabytes, and migrating it out of Google Earth Engine cost tens of thousands of dollars in egress fees alone.

Looking ahead

Geospatial embeddings offer advantages where ground-truth data is scarce, preprocessing is complex, and the problem is not well understood… in other words, in many conservation and environmental monitoring applications.

However, embeddings trade interpretability, temporal granularity, and geographic coverage for convenience and efficiency. And while geospatial embedding models often report error reduction against specific benchmarks, carefully curated datasets and robust analysis workflows can still outperform embeddings while remaining biophysically interpretable. Embeddings are a new tool for compressing complexity, but they are not a silver bullet that eliminates it.

At Cecil, we’re monitoring these developments closely as we consider how embeddings fit into our product. If you’ve been experimenting with embedding models, we’d love to hear what you’ve found. Share your experiences in our Slack community, or explore Cecil’s documentation to discover related datasets available today.

Special thanks to Isaac Corley and the Cloud-Native Geospatial Forum for their work on embeddings product interoperability, Element84 for their examples of AlphaEarth Foundations embeddings in practice, and Simon Ilyushchenko at Google Earth Engine for his friendly review.

Recent updates at Cecil:

Selected by Isometric as one of their Earth Observation data partners
New datasets:
- IBAT STAR Threat Abatement
- PlanetSapling Global Major Mines
- Sylvera Biomass Atlas
- WRI Tropical Tree Cover 10 m and 70 m
- WCS Forest Landscape Integrity Index
- USGS Annual National Landcover Database
- USDA Cropland Data Layer 10 m and 30 m

A guest post by

Sonny Burniston

Exploring how AI and geospatial data can be used to understand and protect biodiversity.

The nature data newsletter

Discussion about this post

Ready for more?