1.087 Geographic Libraries (Python)#
S1 Rapid Discovery of the 2026 Python geographic-library landscape for working with geospatial vector data. Covers the domain’s functional roles: coordinate ingestion (EXIF GPS extraction via Pillow/exifread/piexif/exif), coordinate reference systems (pyproj), the geometry engine (shapely), vector format I/O (geojson, fiona/pyogrio, geopandas), geocoding (geopy), and map visualization (folium). Provides a category-first decision framework for assembling a stack one role at a time, with adoption signals, install reliability, and a CVE check per library. Scope is the vector + web-map side of geospatial Python; raster, PostGIS/server GIS, routing, and 3D/point-cloud are out of scope.
Explainer
Geographic Libraries (Python): A Guide for the Geospatially Curious#
Who This Is For#
You have a column of latitudes and longitudes in a spreadsheet. Or a customer table with street addresses. Or a request that sounds simple: “show our stores on a map,” “find the nearest warehouse,” “how far apart are these two GPS points?” You start typing, reach for the Pythagorean theorem you remember from school, and somewhere a cartographer feels a disturbance in the force.
This guide is for the developer, analyst, or technical lead who is not a GIS specialist and is trying to decide one thing: do I need to learn a geographic library, or can I get away with a few lines of arithmetic? It deliberately does not compare specific libraries (that is what the S1–S4 passes are for). It answers the prior question — is this domain even relevant to me, and why is it harder than it looks?
The Hardware Store Analogy#
Think of geographic work as a trip to a hardware store.
A novice walks in wanting to “hang a shelf” and grabs the first thing that looks like a tool. An experienced person knows the store is organized into aisles, each solving a different class of problem, and that using the wrong aisle’s tool turns a ten-minute job into a weekend of frustration.
Geographic libraries are the same. There is no single “geo tool.” There is a store with distinct aisles:
- The measuring aisle — tape measures, levels, squares. These ingest raw coordinates and tell you where things are and how to read them.
- The translation aisle — adapters, converters, thread gauges. These reconcile the fact that the world ships in incompatible standards (metric vs. imperial, but for coordinate systems).
- The cutting and joining aisle — saws, clamps, glue. These do geometry: intersect, buffer, union, “is this point inside that fence?”
- The fasteners and fittings aisle — the bins of file formats that let everything connect to everything else (GeoJSON, Shapefile, GeoPackage).
- The information desk — where you hand over an address and a clerk looks up exactly where it is. This is geocoding, and notice: it is a desk staffed by people/services, not a tool you take home.
- The showroom — the display models, the painted walls, the lighting. This is visualization: turning all the above into a map a human can actually read.
The expensive mistake is reaching into the wrong aisle. Most “naive” geo bugs are someone trying to cut a board (geometry) with a tape measure (raw coordinate math), or trying to fit a metric bolt into an imperial nut (mixing coordinate systems) and forcing it until something cracks.
The Core Problem: Coordinates Are Deceptively Hard#
Coordinates look like ordinary numbers. They are two floats. Surely you can treat them like points on graph paper?
No. Here is the pile of reasons why, each of which has burned real production systems.
1. The Earth is round, and your math probably assumes it’s flat#
Latitude and longitude are angles, not positions on a flat grid. Longitude measures east-west angle from the prime meridian; latitude measures north-south angle from the equator. They describe a point on a curved, slightly squashed sphere (an ellipsoid — the Earth bulges at the equator).
This means the naive distance formula:
# WRONG over any real distance
dist = math.sqrt((lat2 - lat1)**2 + (lon2 - lon1)**2)is meaningful only as an abstract number, not a real distance. One degree of longitude is about 111 km at the equator but shrinks to zero at the poles, where all meridians meet. So “0.5 degrees apart” means a wildly different real distance in Quito than in Reykjavík. Computing great-circle distance correctly requires trigonometry on the sphere (the haversine formula) or, for survey-grade accuracy, geodesic math on the ellipsoid. Get this wrong and your “nearest store” result is confidently, silently incorrect.
2. A coordinate is meaningless without knowing what it’s measured against#
The number (37.42, -122.08) is not a location. It is a location only once you
say which coordinate reference system (CRS) it belongs to. A CRS combines a
datum (a model of the Earth’s shape and where its center sits) with a way of
expressing positions on it.
The global default for GPS and web maps is WGS84, identified by the code EPSG:4326. But there are thousands of others — national grids, projected systems measured in meters, historical datums. The same physical street corner has different numeric coordinates in different systems, sometimes differing by hundreds of meters.
The classic catastrophe: you load data that is actually in a projected, meters-based system but treat it as WGS84 lat/lon. Or worse, a few rows have missing/zeroed coordinates. Plot them and your data lands in the Atlantic Ocean off the coast of Africa, at latitude 0, longitude 0 — the dreaded “Null Island.” Every geo team has seen it. It is the universal symptom of a CRS or missing-data bug. Mixing coordinate systems does not throw an error; it produces plausible-looking nonsense.
3. Formats proliferate, and they disagree about basic things#
Geographic data ships in a zoo of formats: Shapefile (a 1990s format that is actually 3–4 files pretending to be one), GeoJSON, GeoPackage, KML, WKT, and more. They disagree on details that bite you — most notoriously coordinate order. Is it (longitude, latitude) or (latitude, latitude)? GeoJSON says lon-lat. Many mapping APIs and humans say lat-lon. Swap them and your point in Paris appears in the Indian Ocean.
GeoJSON (RFC 7946) is worth singling out because it is the lingua franca of the web-mapping world. By specification it is always WGS84 and always longitude-first, then latitude. If you remember one format rule, remember that one.
4. “Near” is not a simple comparison#
“Find everything within 5 km” sounds like a numeric filter. But on a sphere,
distance is curved; near a date line or a pole, naive bounding boxes break;
and checking “is this delivery address inside this zone polygon?” is a real
computational-geometry operation, not a < comparison. Doing it efficiently over
millions of rows needs spatial indexes (R-trees and friends). This is genuine
algorithmic work, and it is exactly what mature libraries provide for free.
5. Addresses are not coordinates#
“123 Main St, Springfield” is a human label, not a position. Converting between the two — geocoding (address to coordinate) and reverse geocoding (coordinate to address) — is a fundamentally different kind of task. It is not math; it is a lookup against a constantly-changing database of the world, almost always performed by a network service. That means: it can be slow, it can be wrong, it costs money or has rate limits, and you usually cannot do it offline. Treating geocoding like a pure function is one of the most common architectural surprises in this domain.
The Solution Categories#
These map directly onto the hardware-store aisles. Note: categories, not products.
Coordinate ingestion (the measuring aisle)#
Parsing, validating, and doing correct math on raw coordinates: great-circle and geodesic distances, bearings, midpoints, “move 10 km northeast from here.” This is the entry-level need and where many projects can stop.
Coordinate reference systems (the translation aisle)#
Knowing what coordinates mean and converting between systems — reprojecting from WGS84 lat/lon into a meters-based local projection so that “distance” and “area” come out in real units. The heavy lifting here is done by the PROJ engine.
Geometry operations (the cutting and joining aisle)#
Working with shapes, not just points: polygons, lines, buffers (“everything within 500 m of this road”), intersections, unions, containment tests. The foundational engine is GEOS (a C++ port of the JTS geometry suite).
Vector format I/O (the fasteners aisle)#
Reading and writing the format zoo — Shapefile, GeoJSON, GeoPackage, and dozens more — while preserving CRS metadata. The workhorse engine is GDAL/OGR.
Geocoding (the information desk)#
Address-to-coordinate and back, via services. A separate, network-bound, rate-limited, often-paid concern. Plan for latency, failures, caching, and terms of use.
Visualization (the showroom)#
Turning data into static maps or interactive web maps a human can read — choosing basemaps, projections-for-display, colors, and tiles.
Why the C/C++ Engines Exist (GDAL, GEOS, PROJ)#
You will keep hearing three names: GDAL (formats and raster/vector data access), GEOS (geometry operations), and PROJ (coordinate transformations). These are decades-old C/C++ libraries that encode an enormous amount of hard-won correctness — the precise math of map projections, the edge cases of geometry predicates, the quirks of fifty file formats.
Python geographic libraries are, to a large degree, friendly wrappers around these engines. This matters for two practical reasons:
- You inherit correctness you could never reasonably reimplement. The ellipsoid math, the projection catalogs, the topology handling — that is thousands of person-years of work. Rolling your own is not “a few functions”; it is signing up to rediscover bugs that were fixed in 2007.
- You inherit an installation reality. Because these are native libraries, installation can be the genuinely hard part. This is why packaging choices (wheels, conda) matter and why “it works on my machine” is a real risk in this domain. A library’s value is partly how cleanly it ships the C engines underneath.
Trade-offs: Library vs. Roll-Your-Own vs. Full GIS Platform#
Leaning on geographic libraries#
Pros: Correctness out of the box — proper distance, reprojection, geometry, and format support. You stop reinventing the ellipsoid. Interoperability with the rest of the ecosystem (everything speaks GeoJSON and reads Shapefiles). A clear upgrade path as needs grow.
Cons: A learning curve (CRS, datums, coordinate order will trip you at least once). Heavier dependencies, sometimes with painful native installs. For a genuinely trivial need, it can feel like buying a full toolbox to hang one picture.
Rolling your own#
Pros: Zero dependencies, total control, and for the truly simple case — “distance between two points, accuracy to a kilometer” — a single haversine function really is enough and entirely legitimate.
Cons: A false floor. The moment requirements creep (“now in meters,” “now with polygons,” “now reading a Shapefile,” “now reprojected”), you are reimplementing GEOS and PROJ badly. Most roll-your-own geo code is one feature request away from becoming a maintenance liability, and it will accumulate the exact silent bugs (Null Island, swapped coordinates) the libraries already prevent.
A full GIS platform (QGIS, PostGIS, ArcGIS, etc.)#
Pros: Industrial strength. Spatial databases that index and query millions of features, rich desktop tools for analysts, enterprise data management.
Cons: Heavy. Operationally and conceptually a big commitment. Overkill if you just need to enrich a data pipeline or draw a map in a web app. Often the answer is “a Python library talking to PostGIS,” not “a Python library or a platform.”
When You Need This Domain — and When You Don’t#
You DO need geographic libraries when:#
- Distances or areas must be real and trustworthy. Anything billed, routed, or decided on (delivery zones, service radii, “nearest” anything) cannot ride on flat-earth arithmetic.
- You work with shapes, not just points. Regions, boundaries, zones, “inside or outside,” buffers — that is geometry, and you want a real geometry engine.
- You exchange data with other systems or formats. The instant a Shapefile, GeoJSON file, or another team’s CRS enters the picture, you need proper I/O and reprojection.
- You need to draw real maps. Projections, basemaps, and tiles for a map a human reads correctly.
- You convert between addresses and coordinates. That is geocoding, and you want a real client that handles services, limits, and failures.
You probably DON’T need them when:#
- You have a handful of points and need rough distance. A haversine helper is fine. Accuracy to within a kilometer over short hops is genuinely adequate for many uses.
- Coordinates are just opaque labels. If lat/lon are merely values you store and display verbatim, never measure or transform, you do not need the toolbox.
- Your “map” is a pin on a third-party widget. If you hand coordinates to a hosted map (and they are already WGS84 lat/lon), the widget does the geo work.
- One canonical CRS, no geometry, no format I/O. If nothing reprojects and nothing has a shape, you are below the threshold where the libraries earn their weight.
The honest middle ground#
Most projects start in the “don’t need it” zone and drift into “need it” without noticing. A useful rule: the first time you find yourself writing math about the Earth’s curvature, converting between coordinate systems, or parsing a geo file format — stop and adopt a library. Those three moments are the signal that you have left the measuring aisle and wandered into territory where decades of existing, correct code are waiting for you. Reaching for it then is not over-engineering; it is recognizing which aisle of the store you are standing in.
The One-Paragraph Takeaway#
Coordinates pretend to be simple numbers and are not: the Earth is a lumpy sphere, every coordinate is meaningless without a CRS, file formats disagree about even coordinate order, “near” is real geometry, and addresses are a network lookup, not arithmetic. Geographic libraries exist to wrap the hard-won C engines (GDAL, GEOS, PROJ) so you inherit correctness instead of rediscovering Null Island. If you only ever need rough distance between a few points, a haversine function is honest and sufficient. The moment shapes, reprojection, file formats, or trustworthy measurements enter the picture, walk into the right aisle and pick up the proper tool.
S1: Rapid Discovery
S1 Rapid Discovery — Approach#
Domain#
This S1 surveys the Python geographic-library landscape (2026) for working with geospatial vector data: getting coordinates in, representing and operating on geometry, keeping coordinate systems correct, looking up addresses, and visualizing results on a map. The audience is anyone choosing libraries in this domain — whether they are building a data pipeline, an analysis notebook, a web-map back end, or a one-off conversion utility.
Scope is deliberately bounded to the vector + web-map side of geospatial Python. Raster and satellite imagery (rasterio, GDAL raster pipelines), server-side GIS / PostGIS / tile servers, routing engines, and 3D / point-cloud geospatial (see 1.083) are out of scope.
Functional categories evaluated#
Geographic work in Python decomposes into a handful of roles, and most projects assemble a stack from one or more of them:
- Coordinate ingestion / EXIF GPS extraction — pulling coordinates out of source data,
including GPS tags embedded in photos (
Pillow,exifread,piexif,exif). - Coordinate reference systems (CRS) — projecting and transforming between datums and
grids (
pyproj). - Geometry engine — constructing and operating on points, lines, and polygons:
predicates, buffers, clustering, measurement (
shapely). - Vector format I/O — reading and writing GeoJSON, Shapefile, GeoPackage, etc.
(
geojson,fiona/pyogrio,geopandas). - Geocoding — forward/reverse lookup between addresses and coordinates (
geopy). - Map visualization — rendering interactive maps from Python (
folium).
Selection criteria#
For each library: adoption signals (GitHub stars, PyPI downloads, latest release date, maintenance health), install reliability (does it require a system GDAL/GEOS/PROJ build, or do binary wheels bundle the native deps?), a CVE check, and the category scenarios it fits best versus where a lighter or heavier peer is the better call.
Method notes#
- Data current as of June 2026. Stars and download counts are approximate (rounded GitHub display; pypistats 30-day windows that include CI/mirror traffic, so real human usage is lower).
- “No known CVEs” means no published first-party advisory was found — absence of evidence, not proof of absence. Transitive risk through bundled GDAL/PROJ/zlib/libtiff is noted where relevant.
- Per 4PS S1 rules this is a shopping comparison, not a manual: no install commands, no code samples. Install reliability is discussed as prose because it is a genuine selection criterion in this domain — historical GDAL build pain has sunk projects.
- Per RAIL 0, judgments are category-first: each library is rated on its fit across the domain’s use cases, not against any single triggering task. Use-case-specific picks live in the S3 personas (and any separate project memo), not in these S1 verdicts.
Domain fact worth surfacing early#
GeoJSON is WGS84 by specification. RFC 7946 mandates coordinates in longitude, latitude
order on the WGS84 datum (EPSG:4326). Most consumer GPS sources (including photo EXIF) are
already WGS84. So a surprising amount of GeoJSON-centric work performs no coordinate
transformation at all, which determines whether pyproj belongs in a given stack. This is
a domain property, not a property of any one project, and it shapes the recommendation below.
EXIF GPS readers (Pillow · ExifRead · piexif · exif)#
Overview#
- Category role: Coordinate ingestion — extracting GPS coordinates embedded in photo
EXIF metadata, a common upstream step in getting geographic data into Python. EXIF stores
GPS as DMS rationals (degree/minute/second triplets) plus N/S/E/W reference tags.
Converting to WGS84 decimal degrees is a short hand-roll (
dd = deg + min/60 + sec/3600, then negate for S or W) — no library does it automatically except small helpers. Python’s standard library has no EXIF/GPS support, so a third-party reader is required.
This file compares the four practical readers as a group, since they compete for one role.
ExifRead (ExifRead, import exifread) — healthiest dedicated reader#
- Stars: ~950 · Downloads: ~1.0M/month
- Latest: 3.5.1 (2025-08-23), “Production/Stable”, Python 3.7–3.13
- Maintenance: Active and healthy
- License: BSD-3-Clause
- GPS: Documented GPS sub-IFD support, read-only; exposes lat/lon DMS + refs and ships utility helpers for DMS→decimal.
- Install: pure Python, no native deps — clean anywhere.
- CVEs: none found (Snyk health “Healthy”).
- Verdict: the smallest correct tool for read-only GPS extraction with no other image needs; healthy and dependency-free.
Pillow (Pillow) — the imaging library that also reads GPS#
- Stars: ~13.6k · Downloads: ~432M/month (by far the largest)
- Latest: 12.2.0 (2026-04-01), very active, Tidelift-backed, Python ≥3.10
- License: MIT-CMU
- GPS:
getexif().get_ifd(...)exposes the GPS IFD as DMS rationals + refs; no decimal-degrees helper (hand-roll). - Install: ships binary wheels, installs cleanly.
- CVEs: heavy image-parser CVE history — keep current. Recent: CVE-2026-42308
(font glyph integer overflow → DoS, fixed 12.2.0), CVE-2026-25990 (buffer overflow, 12.x),
CVE-2025-48379 (DDS heap overflow, fixed 11.3.0), CVE-2024-28219 (
strcpyoverflow, fixed 10.3.0), CVE-2023-50447 (ImageMath.evalRCE, CVSS 8.1, fixed 10.2.0). Pattern: recurring C-layer overflows in parsers plus aneval-class RCE. - Verdict: the natural choice for workflows already decoding images (thumbnails, resizing, format conversion) — read GPS from the same library rather than adding a second one. For GPS alone it is a large attack surface; pin a current version regardless, since reading EXIF still parses image bytes.
piexif (piexif) — avoid (stale)#
- Stars: ~390 · Downloads: ~2.4M/month (high, but legacy installs)
- Latest: 1.1.3 (2019-07-01) — ~7 years stale; possibly discontinued
- License: MIT · pure Python · read and write GPS
- Advisory: SNYK-PYTHON-PIEXIF-2312874 — arbitrary file read via unsanitized
load()input; no fix (unmaintained). - Verdict: its one differentiator is writing EXIF back; for reading, staleness and an unpatched advisory rule it out in favor of ExifRead.
exif (exif, import exif) — avoid (dormant)#
- Downloads: ~90k/month (smallest) · Latest: 1.6.0 (2023-01-30), no confirmed later release
- License: MIT · read+write (historical GPS-write bugs)
- GPS: most ergonomic API (
gps_latitude/gps_longitudeattributes). - CVEs: none found.
- Verdict: nicest API, but dormant since early 2023 — not a safe long-term dependency.
Group recommendation#
ExifRead for read-only GPS extraction (healthy, tiny, dedicated). Pillow when the workflow already decodes images, kept on a current version. piexif/exif avoided (stale/dormant). Whichever reader is chosen, the DMS→decimal step is hand-rolled in a few lines.
fiona (and pyogrio)#
Overview#
- Type: Python API/CLI for vector GIS I/O over OGR/GDAL
- License: BSD-3-Clause
- Maintainer: Toblerity/Fiona (same org as shapely)
- Category role: Vector format I/O — the multi-format, GDAL-backed option
Adoption signals (2026-06)#
- GitHub stars: ~1.2k
- PyPI downloads: ~5.7M/month
- Latest release: 1.10.1 (2024-09-16) — FLAG: no 2025/2026 release (
>18months) - Maintenance: Active but slowing into maturity/maintenance mode. Release cadence dropped partly because pyogrio now handles most vector I/O for GeoPandas.
Key capabilities#
- Streams features as plain Python dicts; reads and writes the full catalog of OGR vector drivers, with access to layer schemas, CRS, and feature metadata.
- The general-purpose choice when a stack must speak many formats (Shapefile, GeoPackage, GML, KML, GeoJSON, …), not just one.
- pyogrio is the modern, faster sibling: a vectorized OGR reader/writer (bulk read to arrays/dataframes) that has become GeoPandas’ default I/O engine.
Install reliability#
- Historic GDAL build pain is largely RESOLVED in 2026. PyPI binary wheels now bundle GDAL+GEOS+PROJ (manylinux/macOS/Windows), so installs work without a system GDAL. Caveats: the bundled wheels omit many optional GDAL drivers (e.g. GML) and the docs label them “not for production”; for full driver coverage or production use, build against a system GDAL or use conda-forge. This remains the most build-sensitive member of the domain.
CVE check#
- No CVEs in fiona’s own code. Two transitive advisories via bundled GDAL, both
mitigated in current 1.10.x wheels:
- CVE-2023-45853 (zlib/MiniZip overflow) — fixed in 1.10 wheels (GDAL 3.8.4+).
- CVE-2020-14152 (libjpeg) — low severity; maintainers state not vulnerable.
Trade-offs vs alternatives#
- For: The right tool when a workflow must read/write formats beyond GeoJSON, or do schema-aware streaming of large vector datasets without loading everything into memory.
- Against: It drags in the entire GDAL native stack. When GeoJSON is the only format
needed, the zero-dependency
geojsonpackage does the same job without that weight, and fiona’s slowed release cadence plus the “not for production” wheel caveat add friction. - vs geopandas: geopandas now uses pyogrio by default and treats fiona as optional; for tabular analysis prefer geopandas, for low-level streaming I/O fiona/pyogrio stand alone.
Best for#
- Multi-format vector I/O, large vector datasets, or schema-driven ETL where streaming feature-by-feature matters.
folium#
Overview#
- Type: Python library that generates Leaflet.js maps
- License: MIT
- Maintainer: python-visualization/folium
- Category role: Map visualization — interactive maps from Python, exported as HTML
Adoption signals (2026-06)#
- GitHub stars: ~7.4k
- PyPI downloads: ~3.4M/month
- Latest release: 0.20.0 (2025-06-16) — healthy/active, Python 3.9–3.13
- License: MIT
Key capabilities#
- Produces interactive Leaflet.js maps exportable as a standalone HTML file (also renders inline in Jupyter / Streamlit).
- First-class GeoJSON support:
GeoJsonlayers with styling, highlight, tooltip, and popup;Choroplethfor thematic shading. Tile layers default to OpenStreetMap. - Marker clustering, layer control, and common map widgets out of the box.
Install reliability#
- Clean. Pure-Python (depends on Jinja2, branca, requests); no native build. Installs anywhere.
CVE check#
- No filed direct CVEs (Snyk). Inherent usage caveat (not a logged advisory): folium injects user-supplied strings (popups, tooltips, GeoJSON properties, raw HTML) into the output HTML, so untrusted/unescaped content is an XSS sink. Sanitize any user-controlled strings before rendering. Low risk for personal/trusted-data maps.
Trade-offs vs alternatives#
- For: The fastest path from Python data to an interactive map. One call turns points or a GeoJSON layer into a self-contained HTML page — ideal for notebooks, dashboards, quick visual verification, and reports where the map is the deliverable and Python owns rendering.
- Against (folium vs. raw Leaflet/JS): folium generates a Leaflet page from Python; a hand-written Leaflet front end gives full control over styling, interaction, and bundling with zero Python runtime weight. Projects that already ship a Leaflet/JS front end consuming GeoJSON gain little from folium in production — they would be maintaining two renderers. The natural split: folium for Python-owned views and quick previews; raw Leaflet for a production web front end that already exists.
- vs other Python map libs (plotly, kepler.gl, pydeck): folium is the lightweight, OSM/Leaflet-native choice; the others target large-scale or WebGL/3D visualization.
Best for#
- Notebook/exploratory mapping, quick visual verification of geospatial output, and reports or dashboards where Python generates the map directly.
geojson (PyPI package geojson)#
Overview#
- Type: Pure-Python library
- License: BSD-3-Clause
- Maintainer: Jazzband community collective (jazzband/geojson)
- Category role: Vector format I/O — the lightweight, single-format option
Adoption signals (2026-06)#
- GitHub stars: ~990
- PyPI downloads: ~3–4M/month
- Latest release: 3.3.0 (2026-05-28) — recent, despite an infrequent cadence
- Python support: 3.10–3.14, OS-independent
- Maintenance: Active; Jazzband stewardship (10-year anniversary, March 2026)
Key capabilities#
- Class types for every GeoJSON geometry (Point, LineString, Polygon, Multi*, GeometryCollection) plus Feature and FeatureCollection.
- Encode/decode to and from JSON with validation of GeoJSON structure.
- Coordinate-precision control (trim float noise in output).
- Supports Python’s
__geo_interface__protocol, so it interoperates with shapely and other geo libraries without a hard dependency on them.
Install reliability#
- Best in class for this domain. Pure Python, zero dependencies, no native/binary components. Platform-independent wheel installs cleanly anywhere, including locked-down or air-gapped environments. Nothing to compile, no GDAL/GEOS/PROJ.
CVE check#
- No known CVEs for this package. (CVE-2016-1000225 and CVE-2022-43776 surface in naive searches but belong to other products — Sequelize and Metabase — not this library.)
Trade-offs vs alternatives#
- For: The lightest possible way to read and write GeoJSON specifically. Zero build risk and zero transitive CVE surface make it the natural choice when GeoJSON is the only format in play and the work is serialization rather than analysis.
- Against: Not an analysis library — no spatial predicates, joins, buffers, or
reprojection. It is the serialization layer only. Pair it with shapely (via the shared
__geo_interface__protocol) when geometry operations are needed. - vs fiona / geopandas: both can also write GeoJSON, but each pulls in the GDAL/GEOS native stack. When a project needs many vector formats or tabular analysis, those are the right tools; when GeoJSON is the sole format, this package avoids that weight entirely.
Best for#
- Any stack whose vector I/O is GeoJSON-only and serialization-focused.
- Environments where dependency footprint and install reliability are first-order concerns (CI, containers, air-gapped, embedded).
geopandas#
Overview#
- Type: Geospatial extension of pandas (GeoDataFrame / GeoSeries)
- License: BSD-3-Clause
- Maintainer: GeoPandas org (NumFOCUS fiscally sponsored)
- Category role: Vector I/O + geometry + CRS, integrated — the all-in-one analysis layer
Adoption signals (2026-06)#
- GitHub stars: ~5.2k
- PyPI downloads: ~16M/month
- Latest release: 1.1.3 (~March 2026)
- Python support: ≥3.10
- Maintenance: Very active, the de-facto geospatial dataframe library
Key capabilities#
- A pandas DataFrame with a geometry column: spatial joins, overlays, dissolve/aggregate, CRS reprojection, multi-format read/write, and PostGIS round-trips, with plotting helpers.
- Bundles the whole stack — built on shapely (geometry), pyproj (CRS), and pyogrio (I/O). As of 1.0, fiona is no longer a core dependency; pyogrio is the default engine and fiona is optional.
- Reads and writes GeoJSON directly, so it can serve as both ingest and output layer.
Install reliability#
- Clean in 2026. Because its C-backed deps (shapely/pyproj/pyogrio) all ship self-contained binary wheels with bundled GEOS/PROJ/GDAL, a plain install yields a working stack on mainstream platforms with no manual GDAL build. The switch to pyogrio resolved most of the historic fiona+GDAL friction.
CVE check#
- CVE-2025-69662 — SQL injection in
to_postgis()(CVSS 8.6 HIGH, CWE-89). Affects all versions before 1.1.2; fixed in 1.1.2+ (current 1.1.3 is patched). Relevant only to workflows that write to PostGIS.
Trade-offs vs alternatives#
- For: The most productive choice for analysis — many thousands of features, spatial joins against admin boundaries, overlays, dissolves, choropleths, and CRS-heavy ETL. One import provides the whole toolkit and a familiar pandas API.
- Against: It installs pandas + shapely + pyproj + pyogrio (and their native
GEOS/PROJ/GDAL). When a workflow only needs to serialize coordinates or convert one format,
that is far more machinery than the job requires — the lighter
geojsonpackage (or shapely alone for geometry) covers those cases with a fraction of the footprint. - The dividing line is analysis vs. serialization: geopandas wins the former decisively and is over-provisioned for the latter.
Best for#
- Tabular geospatial analysis at scale, spatial joins/overlays, CRS-heavy ETL, and exploratory geospatial data science.
geopy#
Overview#
- Type: Python geocoding client (wraps 30+ geocoding web services)
- License: MIT
- Maintainer: geopy/geopy
- Category role: Geocoding — forward (address → lat/lon) and reverse (lat/lon → address)
Adoption signals (2026-06)#
- GitHub stars: ~4.8k
- PyPI downloads: ~15M/month
- Latest release: 2.4.1 (2023-11-23) — FLAG: ~2.5 years since last release, though the repo shows commit/issue activity into 2026. Reads as mature/low-churn rather than abandoned, but release recency is a watch item.
- Python support: ≥3.7
- License: MIT
Key capabilities#
- A uniform client over 30+ backends: Nominatim (OpenStreetMap), Google Maps, Bing, ArcGIS, HERE, OpenCage, TomTom, Mapbox, Photon, Pelias, Yandex, and more.
- Forward and reverse geocoding through one consistent interface, so providers can be swapped without rewriting call sites.
- Ships
RateLimiter/AsyncRateLimiterhelpers (min-delay + retry) for staying inside provider usage policies.
Install reliability#
- Clean. Pure-Python client with light dependencies; no native build. Installs anywhere.
CVE check#
- No known CVEs for the package (Snyk; excludes transitive deps).
Operational caveat — Nominatim (the free, no-key OSM backend)#
- OSM’s usage policy caps Nominatim at an absolute maximum of 1 request/second across
all your traffic, and requires a custom
User-Agent(stock/default UAs are rejected). The server does not throttle for you — you must self-limit (geopy’sRateLimiterdoes this). - Free and keyless, but low-volume: fine for occasional lookups, not bulk geocoding. At scale, use a paid backend (Mapbox/Google/OpenCage) with an API key.
Trade-offs vs alternatives#
- For: The standard way to add address ↔ coordinate lookup to a Python workflow. Its provider-agnostic interface is the main draw — start free on Nominatim, scale to a paid backend later without rewriting code.
- Against: Network-dependent and rate-limited, so it is never part of a deterministic offline core. Common roles are narrow and often optional: enriching records with place names, or filling coordinates for data that lacks them. Workflows whose sources already carry coordinates may not need it at all.
Best for#
- Geocoding/enrichment: deriving coordinates from addresses, or attaching human-readable place names to coordinate data; provider portability across geocoding services.
pyproj#
Overview#
- Type: Python interface to the PROJ C library
- License: MIT (bundled PROJ has its own license)
- Maintainer: pyproj4 (PROJ / OSGeo ecosystem)
- Category role: Coordinate reference systems — projections and transforms
Adoption signals (2026-06)#
- GitHub stars: ~1.2k
- PyPI downloads: ~21M/month
- Latest release: 3.7.2 (2025-08-14)
- Python support: 3.11–3.14
- Maintenance: Active, the canonical CRS engine for Python
Key capabilities#
- Transform coordinates between any two CRS (datum shifts, EPSG code lookups) via its Transformer/CRS APIs.
- Geodesic calculations (great-circle distance, forward/back azimuth) via the
GeodAPI — true on-the-ground distances rather than planar approximations. - Full EPSG registry access and CRS metadata inspection.
Install reliability#
- Clean. PyPI binary wheels (Windows/macOS/manylinux2014) bundle compiled PROJ —
no system PROJ needed. Requires pip 19.3+. Wheels do not bundle transformation grids
(downloaded on demand or via
pyproj sync); the sdist and conda-forge link a system PROJ.
CVE check#
- No known CVEs (none in GitHub Security Advisories, NVD, or distro trackers). Note transitive risk: PROJ pulls libtiff/SQLite, which carry their own CVEs over time — a dependency-chain concern, not a pyproj-assigned vulnerability.
Trade-offs vs alternatives#
- For: Indispensable whenever data crosses coordinate systems — projected national grids or Web Mercator to/from lon-lat, datum conversions, accurate geodesic distance/azimuth. It is the only mature, full-featured CRS engine in the Python ecosystem.
- Against: Pure CRS — no geometry, no I/O. And it is unnecessary in WGS84-only workflows: because GeoJSON is WGS84 by spec (RFC 7946) and most GPS sources are already WGS84, stacks that stay in lon-lat throughout never invoke it. Whether pyproj belongs in a stack is essentially a yes/no question of “does this data change coordinate systems?”
- For one-off WGS84 distance needs, an inline haversine often suffices and avoids the dependency; pyproj earns its place when accuracy across datums/projections matters.
Best for#
- Any workflow that ingests or emits projected / non-WGS84 coordinates.
- High-accuracy geodesic distance and azimuth computation.
S1 Recommendation — Python Geographic Libraries#
How to choose in this domain#
Python geographic work is a stack-assembly problem: pick one library per role you
actually need, and skip the roles you don’t. The single biggest sizing question is
WGS84-only vs. multi-CRS — because GeoJSON is WGS84 by spec (RFC 7946) and most consumer
GPS is already WGS84, a large class of workflows never reproject and can omit pyproj
entirely. The second question is serialize vs. analyze — light serialization favors small
single-purpose libraries; analysis over many features favors the integrated dataframe stack.
Library roster by role#
| Role | Library | When it’s the right call | Install 2026 | First-party CVEs |
|---|---|---|---|---|
| Vector I/O (GeoJSON only) | geojson | One format, serialization-focused | Pure Python, zero deps | None |
| Vector I/O (many formats) | fiona / pyogrio | Shapefile/GPKG/GML/…; streaming large data | Wheels bundle GDAL | Transitive, mitigated |
| Vector I/O + analysis | geopandas | Spatial joins/overlays at scale, CRS-heavy ETL | Wheels (full stack) | CVE-2025-69662 (to_postgis) |
| Geometry engine | shapely | Buffers, clustering, predicates, distance | Wheels bundle GEOS | None |
| CRS / transforms | pyproj | Data crosses coordinate systems | Wheels bundle PROJ | None |
| Geocoding | geopy | Address ↔ coordinate lookup, enrichment | Pure Python | None |
| Map visualization | folium | Python-owned interactive maps / previews | Pure Python | None (XSS caveat) |
| EXIF GPS ingestion | exifread (or Pillow) | Read GPS from photos | Pure Python (Pillow: wheels) | None (Pillow: heavy) |
Decision framework#
Vector format I/O
- GeoJSON only, serialization →
geojson(pure Python, zero deps, no build risk). - Multiple formats or streaming large vector data →
fiona/pyogrio. - Reading + analyzing tabular geodata →
geopandas(which also does the I/O).
Geometry operations
- Any buffers, clustering, dedup, predicates, distance →
shapely. - None of the above (just store/serialize coordinates) → no geometry engine needed.
Coordinate systems
- Data changes CRS / uses projected grids / needs accurate geodesic distance →
pyproj. - Everything stays WGS84 lon-lat → omit
pyproj; convert EXIF DMS→decimal inline.
Getting coordinates in
- GPS embedded in photos →
exifread(read-only, healthy) orPillow(if already decoding images). - Coordinates derived from addresses, or place-name enrichment →
geopy.
Visualization
- Python should render the map (notebooks, dashboards, previews) →
folium. - A dedicated Leaflet/JS front end already consumes GeoJSON → hand it the file; folium adds little.
Use-case fit (S3 persona seeds)#
Different needs assemble different subsets — illustrative, not exhaustive:
- Geotagged-photo → GeoJSON for a web map.
exifread+geojson, plus an inline DMS→decimal conversion; WGS84 throughout, so nopyproj. Addshapelyif clustering or buffering points;geopyto backfill photos lacking GPS;foliumfor a local preview before handing the file to an existing Leaflet front end. Two pure-Python dependencies cover the core, with no native build and no known CVEs. - Spatial analysis / data science.
geopandasas the spine (it brings shapely + pyproj + pyogrio), for spatial joins against boundaries, overlays, and choropleths. - Multi-format GIS ETL.
fiona/pyogriofor streaming I/O across Shapefile/GeoPackage/ GML, withpyprojfor reprojection andshapelyfor geometry cleanup. - Address-driven mapping.
geopy(Nominatim or a paid backend) →shapely/geojson→foliumfor a quick interactive result.
Install / build-pain guidance (2026 status)#
The historical “you must apt-get GDAL/GEOS/PROJ first” pain is mostly gone: shapely
(GEOS), pyproj (PROJ), and fiona/pyogrio (GDAL) all ship self-contained binary wheels on
mainstream platforms. Residual risk lives in: (a) fiona/pyogrio bundled wheels missing
optional GDAL drivers and labeled not-for-production — the most build-sensitive members of the
set; (b) brand-new Python versions before wheels are published, which force a source build
of the native deps; (c) niche platforms/architectures without prebuilt wheels. Pure-Python
members (geojson, geopy, folium, exifread) and shapely’s exceptionally reliable GEOS
wheel dodge all three — worth weighting toward when install reliability is a hard constraint.
Security summary#
- Pure-Python members (
geojson,exifread,geopy,folium) — no native attack surface; no known first-party CVEs. - shapely / pyproj — no known CVEs; bundled GEOS/PROJ native code.
- geopandas CVE-2025-69662 —
to_postgisSQL injection, fixed 1.1.2+; relevant only to PostGIS writers. - fiona — only transitive-via-GDAL advisories, already mitigated in current wheels.
- Pillow — the standout watch item: recurring image-parser overflows + an
ImageMath.evalRCE class. Pin a current release; parsing image bytes is unavoidable when reading EXIF. - folium — no CVE, but it does not escape user strings rendered into popups/HTML; sanitize untrusted content.
- geopy — no CVE; main risk is operational (Nominatim rate limits / User-Agent policy).
Maturity watch items#
- fiona — no release since Sep 2024; dropped as a GeoPandas core dependency in favor of pyogrio.
- geopy — no release since Nov 2023 (repo still shows activity); mature/low-churn.
- piexif (2019) and exif (2023) — stale/dormant; prefer
exifreadfor EXIF reads.
Bottom line#
Assemble the smallest stack that covers the roles you need. geojson + shapely form a
lightweight, reliable core for WGS84 GeoJSON work; add pyproj only when coordinates cross
systems, geopandas only when analysis (not serialization) is the goal, fiona/pyogrio
only for multi-format I/O, geopy for geocoding, and folium when Python owns the map.
shapely#
Overview#
- Type: Python library wrapping the GEOS C++ engine
- License: BSD-3-Clause (bundled GEOS is LGPL-2.1)
- Maintainer: shapely/shapely (Toblerity / GeoPandas ecosystem)
- Category role: Geometry engine — planar (2D) geometry construction and operations
Adoption signals (2026-06)#
- GitHub stars: ~4.5k
- PyPI downloads: ~67M/month (one of the most-downloaded geo packages)
- Latest release: 2.1.2 (2025-09-24)
- Maintenance: Very active, core of the modern Python geo stack
Key capabilities#
- Geometry types (Point, LineString, Polygon, and Multi*/collections) with predicates (contains, intersects, within), set operations (union, intersection, difference), measurement (area, length, distance), and buffering (radius zones around features).
- Convex hull, centroid, simplification, nearest-point — building blocks for simple clustering (buffer-and-union, or wrapping the output of a clustering algorithm as geometries).
- Shapely 2.x adds vectorized NumPy-ufunc array operations — operate on whole arrays of geometries at once rather than Python-loop per feature.
- Interoperates with the
geojsonpackage and geopandas via__geo_interface__.
Install reliability#
- Clean in 2026. Shapely 2.x binary wheels bundle GEOS (~3.13.x). Installs with
no system GEOS required, with wheels for manylinux x86_64/aarch64, musllinux, macOS,
and Windows. Only a from-source build needs
libgeos-dev+ a compiler. Widely regarded as the most reliable native-wheel build in the geospatial ecosystem.
CVE check#
- No known CVEs (OSV/PyPI search empty). Bundled GEOS is LGPL native code but carries no outstanding advisory affecting shapely wheels.
Trade-offs vs alternatives#
- For: The standard geometry engine for Python. Needed whenever work goes beyond storing coordinates — buffers, clustering, deduplication of near-identical locations, bounding-box and distance computation, spatial predicates.
- Against: Pure geometry — it does not read/write files (pair with
geojsonor fiona) and does not reproject (that ispyproj’s role). For tasks that only assemble and serialize coordinates with no spatial math, shapely is not required. - vs geopandas: geopandas embeds shapely plus pandas, I/O, and CRS machinery. Use shapely directly when you want geometry operations without a dataframe layer; reach for geopandas when tabular analysis over many features is the point.
Best for#
- Any geometry/spatial-operation step: buffers, clustering, hulls, distance, dedup, predicates.
- The geometry engine under a lightweight I/O layer when a full dataframe stack is unwarranted.
S2: Comprehensive
S2 — Comprehensive Discovery: Approach#
Scope#
This pass examines the core Python geospatial-vector stack at the level of how each library is built and works internally — the native engine it wraps, its in-memory data model, the algorithms it exposes, performance behavior, API design, and how it composes with the other libraries in the stack.
Five libraries are covered, chosen because together they form the de-facto foundation of Python vector geoprocessing:
- Shapely — planar geometry operations (wraps GEOS).
- GeoPandas — tabular geospatial analysis (pandas + Shapely + pyproj).
- pyproj — coordinate reference systems and transformation (wraps PROJ).
- Fiona / pyogrio — vector file I/O (wrap OGR/GDAL).
- geojson — pure-Python GeoJSON encoding/decoding.
What “comprehensive / architecture” means here#
Each library write-up is evaluated against a fixed technical rubric so the documents are directly comparable:
- Native engine — what C/C++ library (if any) does the real work, and how the Python layer binds to it (ctypes, Cython, C-extension, pure Python).
- Core data model — the in-memory representation of geometries/features.
- Key algorithms & techniques — the geometric or numerical operations the library is responsible for.
- Performance characteristics — described qualitatively (vectorization, per-call overhead, copy costs, indexing); no fabricated benchmark numbers.
- API design patterns — the idioms the library encourages.
- Composition — how it interoperates with the rest of the stack
(
__geo_interface__, shared C libraries, object hand-off). - Technical limitations — what it deliberately does not do, and the sharp edges that matter at scale.
Method#
Analysis is drawn from each project’s documentation, source layout, changelog, and the architecture of the C libraries they bind. Version and adoption facts are pinned to verified releases current as of mid-2026. Where exact timing or throughput would require benchmarking on specific hardware, the text describes the shape of the performance instead of inventing figures.
Category-first stance#
Libraries are judged on their own technical merits across the whole geospatial-vector domain — not against any single pipeline or “minimal dependency” goal. A library that is excellent for streaming I/O and weak for analytics is described that way, leaving the use-case fit to S3/S4.
Code snippets#
Snippets are short (3-8 lines), fenced as python, and exist only to show
API shape — never installation, setup, or end-to-end tutorials.
Fiona & pyogrio — Technical Architecture#
Overview#
Fiona and pyogrio are the vector file I/O layer of the Python geospatial stack. Neither computes geometry or transforms coordinates; their job is to read and write vector data — Shapefiles, GeoPackages, GeoJSON, FlatGeobuf, PostGIS tables, and the long tail of formats — by binding to OGR, the vector half of GDAL. They differ fundamentally in their access model: Fiona is a feature-streaming library, pyogrio is a vectorized bulk reader/writer. They are treated together here because they solve the same problem with opposite performance trade-offs, and because GeoPandas now ships with pyogrio as its default engine and Fiona as an option.
- Fiona — current release v1.10.1 (September 2024; no newer release since), BSD-3-Clause.
- pyogrio — the modern vectorized OGR reader/writer, GeoPandas’ default I/O engine.
Native engine: OGR / GDAL#
Both libraries bind to OGR, GDAL’s vector data abstraction layer. OGR provides a uniform model — datasource → layer → feature → geometry/fields — over dozens of underlying drivers, each of which knows how to parse one storage format. The Python library’s responsibility is to drive OGR’s C/C++ API and convert between OGR’s representations and Python.
Both publish binary wheels that bundle GDAL (with GEOS and PROJ), so a standard install ships a working GDAL without a system GDAL build — historically one of the most painful dependencies in the Python ecosystem.
Driver caveat (Fiona)#
Fiona’s wheels deliberately omit some GDAL drivers to keep the binary manageable, and the project flags these wheels as convenience builds “not for production” when an exotic driver is required. Production deployments needing the full driver set are expected to link against a system GDAL. pyogrio’s wheels are likewise GDAL-bundled; driver availability tracks the bundled GDAL build.
Core data model#
Fiona — features as dictionaries#
Fiona reads a layer as an iterator of feature records, each a plain Python dict-like object following the GeoJSON-feature shape:
import fiona
with fiona.open("roads.gpkg") as src:
crs, schema = src.crs, src.schema
for feat in src: # streams one feature at a time
geom = feat["geometry"] # geo-interface mapping (not Shapely)
props = feat["properties"]The geometry is a GeoJSON-style mapping, not a Shapely object — Fiona stays
deliberately I/O-only and leaves geometry construction to the caller (typically
via shapely.geometry.shape). A layer also exposes its schema (field names
and types) and CRS. The streaming model means Fiona’s memory footprint is
roughly one feature at a time, which is what makes it suitable for arbitrarily
large files.
pyogrio — columns as arrays#
pyogrio reads a whole layer (or a filtered subset) into a columnar, vectorized
result: geometries as a packed WKB array and attributes as NumPy/Arrow
columns, materialized in one batched pass through OGR. It exposes
read_dataframe (returns a GeoPandas GeoDataFrame directly) and lower-level
read/write functions. The data model is “the whole table at once,” the
inverse of Fiona’s “one feature at a time.”
Key techniques#
- Driver abstraction — both rely on OGR to select a driver by format and translate its native records into a common shape; format-specific quirks (Shapefile’s field-name length limits, encoding handling, GeoPackage SQLite access) are handled in OGR.
- Schema/CRS introspection — both surface a layer’s field schema and CRS metadata so downstream code knows the structure before reading.
- Spatial / attribute filtering — both can push a bounding-box or attribute filter into OGR so only matching features are materialized, avoiding a full read.
- Streaming vs batching — Fiona’s per-feature generator vs pyogrio’s single vectorized read/write is the core algorithmic distinction and the source of their performance difference.
- WKB as the interchange — pyogrio moves geometry as Well-Known Binary arrays, which is cheap to hand to Shapely’s vectorized constructors.
Performance characteristics#
- pyogrio is markedly faster for bulk load/save: by batching the whole layer through OGR and avoiding a Python object per feature, it sidesteps the per-feature interpreter overhead that dominates Fiona on large datasets. This is precisely why GeoPandas adopted it as the default engine.
- Fiona’s strength is bounded memory: streaming one feature at a time means it can process files far larger than RAM, and it is convenient when the workload is “iterate and act on each feature” rather than “load a table.”
- Filtering pays off: pushing bbox/attribute filters into OGR limits both I/O and materialization for both libraries.
- I/O is usually the pipeline bottleneck, so the engine choice has outsized impact on end-to-end GeoPandas performance for read/write-heavy work.
API design patterns#
- Fiona — context-manager + iterator (
with fiona.open(...) as src: for feat in src), dict-shaped features, geo-interface geometries. Idiomatic for ETL and per-feature transformation. - pyogrio — function-oriented and dataframe-centric
(
read_dataframe/write_dataframe), designed to be the invisible engine underneathgeopandas.read_file. - Both are I/O-pure — they read and write; geometry math, CRS transforms, and analytics are explicitly someone else’s job.
Composition with the stack#
- GeoPandas calls pyogrio (default) or Fiona (optional) to populate and
serialize
GeoDataFrames; the engine is selectable viaengine=. - Shapely consumes Fiona’s geo-interface mappings (
shape()) or pyogrio’s WKB arrays to build geometry objects. - pyproj parses the CRS metadata these libraries surface so coordinates can be interpreted and reprojected.
- geojson / GeoJSON overlaps with Fiona for the specific GeoJSON format, but Fiona/pyogrio cover the entire OGR driver universe, not just GeoJSON.
Technical limitations#
- GDAL dependency weight — both pull in GDAL, the heaviest native dependency in the stack; behavior and format support are tied to the bundled GDAL version.
- Fiona driver gaps in wheels — the convenience wheels omit some drivers and are flagged “not for production”; full driver coverage may require a system GDAL.
- Fiona maintenance cadence — no release since v1.10.1 (Sep 2024); the ecosystem’s I/O momentum has shifted toward pyogrio.
- pyogrio’s whole-layer model — its speed comes from materializing the layer; for truly memory-bound streaming of enormous files, Fiona’s per-feature model is still the right tool.
- No analytics — neither library does geometry operations, joins, or transforms; they are strictly the read/write boundary of the stack.
geojson — Technical Architecture#
Overview#
geojson is the smallest library in the stack and the most narrowly scoped: it
is a pure-Python implementation of the GeoJSON data format. It provides
Python classes and encode/decode helpers for GeoJSON’s object types — Point,
LineString, Polygon, their Multi* variants, GeometryCollection,
Feature, and FeatureCollection — and integrates with Python’s standard
json module so GeoJSON serializes and parses like any other JSON. It does no
geometry computation, no coordinate transformation, and no file-format
abstraction; it is a typed model and (de)serializer for one specific format.
Current release at the time of writing is v3.3.0 (May 2026), licensed BSD-3-Clause, maintained under the Jazzband collective. Its defining property is zero dependencies — pure Python, no compiled engine, no GDAL, no NumPy.
Native engine: none (pure Python)#
Unlike every other library in this survey, geojson wraps no C/C++ engine. It is
ordinary Python built on the standard library’s json module. This is its whole
value proposition: it adds no native build, no wheels-with-bundled-libraries
concern, and no version coupling to GEOS/PROJ/GDAL. The cost is that it does
nothing computational — it only models and (de)serializes the format.
Core data model#
The library mirrors the GeoJSON object hierarchy as Python classes that
subclass dict. Because instances are dicts, they serialize through the
standard json module directly:
import geojson
pt = geojson.Point((-122.33, 47.61)) # (lon, lat) order per RFC 7946
feat = geojson.Feature(geometry=pt, properties={"name": "Seattle"})
s = geojson.dumps(feat) # -> GeoJSON text
back = geojson.loads(s) # -> typed geojson objectsKey model facts:
- Coordinate order is (longitude, latitude) and the coordinate reference system is WGS84 / EPSG:4326 by RFC 7946 — GeoJSON is, by specification, lon/lat decimal degrees. The library models the format; it does not enforce or convert datums.
Featurebinds a geometry to apropertiesdict and optionalid;FeatureCollectionis a list of features. These are the units most real-world GeoJSON files use.- Objects expose a
is_valid/validation check for structural conformance (right coordinate nesting, required members) — structural validity, not topological validity (it will not tell you a polygon self-intersects; that is GEOS’s job).
Key techniques#
- Typed (de)serialization —
dumps/loadsproduce and consume the correct GeoJSON object types rather than raw nested dicts, so downstream code gets attribute access and type identity. __geo_interface__support — every object implements the Python geo-interface protocol, the shared contract that lets Shapely, GeoPandas, and Fiona exchange geometries withgeojsonwithout any of them depending on it. This is the library’s main composition mechanism.- Structural validation — checks that an object conforms to the GeoJSON grammar (member presence, coordinate nesting/arity).
- Stdlib JSON integration — because objects subclass
dict, they interoperate withjson.dump/json.loadand any tooling that speaks plain JSON.
Performance characteristics#
- Pure-Python parsing speed: encode/decode runs at Python
jsonspeed, with per-object construction overhead. For small to moderate GeoJSON documents this is a non-issue; for very large feature collections the per-feature Python object creation makes it slower than a vectorized OGR read (pyogrio) that materializes columns in C. - No native install cost: zero dependencies means trivial, fast, portable installation — no wheels-with-bundled-GDAL, no PROJ grids, nothing to compile. In constrained or air-gapped environments this is a genuine architectural advantage.
- Memory: holds the whole parsed document as Python objects; like any JSON parse, memory scales with document size and there is no streaming model in the core library.
API design patterns#
dumps/loadsmirror the stdlibjsonAPI exactly, lowering the learning curve to zero for anyone who knowsjson.- Constructor objects — build geometry/feature objects directly
(
geojson.Point(...),geojson.Feature(...)) and let them serialize themselves. - Geo-interface as the bridge — pass a
geojsonobject anywhere a Shapelyshape()or GeoPandas constructor accepts a geo-interface, and vice versa.
Composition with the stack#
- Shapely converts to/from
geojson(and any geo-interface object) viashape()/mapping();geojsonis the convenient typed carrier for the GeoJSON format specifically. - GeoPandas can ingest GeoJSON through pyogrio/Fiona (full OGR path) but the
geojsonlibrary is handy for lightweight, dependency-free GeoJSON construction and inspection without spinning up GDAL. - Web / API boundaries — because GeoJSON is the dominant interchange format
for web maps and HTTP APIs,
geojsonis frequently the serialization layer at the edge of a service, where pulling in GDAL would be overkill.
Technical limitations#
- No computation — no predicates, overlay, buffering, measurement, or transforms; it strictly models the format.
- One format only — GeoJSON exclusively; no Shapefile, GeoPackage, or other OGR formats (that is Fiona/pyogrio’s role).
- Fixed CRS by spec — RFC 7946 GeoJSON is WGS84 lon/lat; the library does not reproject and does not (by the modern spec) carry alternate CRS metadata, so coordinate-system handling must happen elsewhere (pyproj).
- Structural validation only — it checks grammar, not topological validity; a structurally valid polygon can still be geometrically invalid.
- Pure-Python throughput ceiling — for very large feature collections, a vectorized OGR reader will outperform it on parse/serialize speed.
GeoPandas — Technical Architecture#
Overview#
GeoPandas extends pandas to the geospatial domain. It adds a geometry column type to the familiar DataFrame, so a table of rows-with-attributes can also carry a vector geometry per row and support spatial operations — joins, overlays, reprojection, dissolves — using the same indexing and grouping idioms that pandas users already know. It is the integration layer of the Python vector stack rather than a geometry engine in its own right.
Current release at the time of writing is v1.1.3 (around March 2026),
licensed BSD-3-Clause, requiring Python >= 3.10. The 1.0 milestone
stabilized the API and made pyogrio the default I/O engine, with fiona
demoted to an optional alternative.
Native engine: composition, not C#
GeoPandas has essentially no geometry C code of its own. It is an orchestration library that delegates each concern to a specialist:
- Geometry operations → Shapely (and therefore GEOS).
- Coordinate transforms / CRS → pyproj (and therefore PROJ).
- File I/O → pyogrio by default (and therefore OGR/GDAL), with fiona optional.
- Tabular machinery, indexing, grouping, alignment → pandas / NumPy.
- Optional spatial joins via database → SQLAlchemy + PostGIS.
This “thin glue over strong specialists” design is the central architectural fact about GeoPandas: its correctness and performance are largely inherited from the libraries beneath it.
Core data model#
Two types extend pandas:
GeoSeries— a pandasSerieswhose values are Shapely geometries, stored in a dedicated extension-array (GeometryArray) that holds a NumPy object array of GEOS pointers under the hood. TheGeoSeriescarries a single CRS for the whole column.GeoDataFrame— a pandasDataFramewith one or more geometry columns, one of which is designated the active geometry (accessed via.geometry). Ordinary columns hold attributes; spatial operations act through the active geometry column.
import geopandas as gpd
gdf = gpd.read_file("regions.gpkg") # pyogrio reads -> GeoDataFrame
gdf = gdf.to_crs(3857) # pyproj reprojects the whole column
gdf["a"] = gdf.area # vectorized Shapely measurementBecause the geometry column is a NumPy-backed extension array, GeoPandas can hand the whole array to Shapely’s vectorized ufuncs in one shot — the 2.x Shapely rewrite is what gives GeoPandas its modern speed.
Key operations and techniques#
- CRS management & reprojection —
set_crsrecords a CRS;to_crsreprojects every geometry through a cached pyprojTransformer. The CRS is metadata on the column, validated on operations that combine layers. - Spatial joins —
sjoinmatches rows of two frames by a spatial predicate (intersects,within,contains, etc.);sjoin_nearestmatches by proximity. Both build an STRtree over one side to avoid the O(n·m) cartesian blow-up. - Overlay —
overlayperforms set-theoretic combination of two polygon layers (intersection/union/difference/identity), the GIS “clip two layers together” operation, delegating the per-pair geometry math to Shapely/GEOS. - Dissolve —
dissolveis a spatial group-by: rows are grouped by an attribute and their geometries unioned, with attribute aggregation handled by the pandas group-by engine. - Clip, buffer, simplify, centroid, area, length, distance — vectorized pass-throughs to Shapely array functions.
- Spatial indexing —
.sindexexposes an STRtree built lazily over the active column for user-driven candidate queries.
Performance characteristics#
- Vectorization is inherited: column-wide geometry operations run through
Shapely’s C ufuncs, so a single
gdf.areaorgdf.buffer(d)loops in C, not Python. Avoid.apply()with a Python lambda over geometries when a vectorized method exists. - Joins scale via the index:
sjoin/sjoin_nearestare practical on large frames precisely because they use the STRtree; without it they would be quadratic. - I/O is the usual bottleneck, and the default
pyogrioengine matters a lot here — it reads/writes whole layers in a vectorized OGR path rather than feature-by-feature, which is markedly faster than the fiona streaming path for bulk loads. - CRS transforms are cached: building a pyproj
Transformeris relatively costly, so reusing it across a column (asto_crsdoes) amortizes that cost. - Single-machine, in-memory: like pandas, the whole frame lives in RAM. Beyond memory limits, the ecosystem answer is Dask-GeoPandas (partitioned, parallel) or pushing work into PostGIS — not GeoPandas alone.
API design patterns#
- Pandas mirroring — almost every spatial method returns a new
GeoDataFrame/GeoSeries, composes withloc/iloc/groupby, and follows pandas naming. The learning curve for a pandas user is mostly “where is the geometry.” - Active geometry column — operations implicitly use
.geometry; multiple geometry columns are allowed but one is “active” at a time. - CRS-as-metadata — the CRS travels with the column and is checked when layers are combined, catching the classic “two datasets in different projections” error that raw Shapely would silently mis-compute.
- Engine selection —
read_file/to_fileaccept anengine=argument to choosepyogrio(default) orfiona.
Composition with the stack#
GeoPandas is the composition point. It is the place where Shapely geometry,
pyproj CRS handling, and OGR-based I/O are stitched into one tabular object.
Outward, a GeoDataFrame interoperates with the broader PyData world: plotting
through Matplotlib (.plot()) and interactive maps (.explore() via folium),
analytics handoff to NumPy/scikit-learn, raster crossover with rasterio/xarray,
and scale-out via Dask-GeoPandas. Individual geometries remain plain Shapely
objects, so dropping down to the geometry layer is seamless.
Technical limitations#
- No geometry engine of its own — every geometric guarantee and bug is inherited from GEOS via Shapely; GeoPandas cannot fix robustness issues itself.
- In-memory, single-process — large-than-RAM or parallel workloads require Dask-GeoPandas or a database; core GeoPandas does not parallelize.
- Heavy dependency surface — it pulls in pandas, Shapely (GEOS), pyproj (PROJ), and pyogrio/fiona (GDAL); the transitive native footprint is large compared to the leaf libraries it orchestrates.
- Planar geometry assumptions — area/length/distance are planar (Shapely semantics); meaningful metric results require reprojecting to a suitable projected CRS first.
- One CRS per geometry column — mixed-CRS data must be reprojected to a common frame before combined operations.
pyproj — Technical Architecture#
Overview#
pyproj is the coordinate-reference-system (CRS) and coordinate-transformation library of the Python geospatial stack. It answers the questions Shapely deliberately ignores: what does a coordinate mean, and how do I move a coordinate from one reference frame to another. It also provides geodesic computation on the ellipsoid — true distances, areas, and forward/inverse problems on a curved Earth. It is the canonical Python binding to PROJ, the foundational C transformation library used across the entire open-source GIS world (GDAL, GRASS, QGIS, PostGIS all rely on PROJ).
Current release at the time of writing is v3.7.2 (August 2025), licensed MIT.
Native engine: PROJ#
pyproj wraps the PROJ C library via Cython. PROJ holds the transformation machinery, the projection method implementations (Mercator, UTM, Albers, Lambert Conformal Conic, and hundreds more), the ellipsoid/datum definitions, and the WKT/PROJ-string/EPSG parsing logic. Binary wheels on PyPI bundle PROJ, so a standard install ships a complete transformation engine without a system PROJ.
PROJ ships with a built-in database (an SQLite catalog of the EPSG registry and
related authorities) that lets pyproj resolve a CRS by code (e.g. EPSG:4326)
to its full definition, and discover the available transformation pipelines
between any two CRSs.
Datum grids downloaded on demand#
High-accuracy datum transformations (e.g. NAD83↔NAD27, vertical datums, country-specific shift grids) require large grid files that PROJ does not bundle. pyproj/PROJ can download these grids on demand from a CDN when a transformation needs them, caching them locally. Without the grids, PROJ falls back to a lower-accuracy transformation path rather than failing — an important subtlety: results are correct in shape but may differ at the sub-meter level depending on which grids are present.
Core data model#
pyproj exposes three primary objects:
CRS— an immutable description of a coordinate reference system. It can be constructed from an EPSG code, a WKT2 string, a PROJ string, a PROJJSON document, or an authority name, and it exposes the axis order, units, datum, ellipsoid, and area of use.Transformer— a compiled, reusable transformation between a source and target CRS. Building one resolves the best available pipeline (a sequence of PROJ operations, possibly using grids); applying it transforms coordinate arrays.Geod— geodesic calculations on a chosen ellipsoid: the forward problem (point + azimuth + distance → point), the inverse problem (two points → distance + azimuths), and geodesic polygon area/perimeter.
from pyproj import Transformer
t = Transformer.from_crs(4326, 3857, always_xy=True) # build once, reuse
x, y = t.transform(-122.33, 47.61) # lon/lat -> web-mercatorKey algorithms and techniques#
- Projection forward/inverse — mapping geographic (lat/lon on an ellipsoid) to/from projected planar coordinates, per the chosen projection’s math.
- Datum transformation — moving between reference frames via Helmert (7-parameter) transforms or, for higher accuracy, grid-shift interpolation.
- Pipeline resolution — given two CRSs, PROJ enumerates candidate operations and picks the most accurate one whose required grids are available, ranking by declared accuracy and area of use.
- Geodesics — Karney’s algorithm (the modern, highly accurate geodesic
method) underlies
Geod, giving correct distances and areas on the ellipsoid rather than the plane. - Axis-order handling — the perennial lat/lon vs lon/xy confusion is managed
explicitly via the
always_xyflag, which normalizes to (x=lon, y=lat) input order regardless of the CRS’s declared axis order.
Performance characteristics#
- Build once, transform many: constructing a
Transformeris the expensive step (it queries the PROJ database and compiles a pipeline). Reusing a single transformer across many points is dramatically cheaper than rebuilding it — this is exactly why GeoPandas caches transformers perto_crs. - Vectorized transforms:
Transformer.transformaccepts NumPy arrays and transforms them in a tight loop in C, so transforming millions of coordinates is efficient when passed as arrays rather than per-point Python calls. - Grid-dependent accuracy/latency: the first transform that needs a remote grid pays a one-time download cost; thereafter the cached grid is used.
Geodis per-call math — geodesic computations are cheap individually and also accept arrays for batch forward/inverse problems.
API design patterns#
- Immutable CRS, reusable Transformer — the natural idiom is “describe the CRSs as objects, build a transformer, apply it repeatedly.”
always_xy=True— the standard defensive setting to avoid axis-order surprises; most stack code uses it.- Multiple construction sources —
CRS.from_epsg,from_wkt,from_proj4,from_user_inputaccommodate whatever CRS representation the data arrived with. - Array-in / array-out — transforms are NumPy-aware, encouraging bulk coordinate processing.
Composition with the stack#
- GeoPandas uses pyproj
CRSobjects as the metadata it attaches to a geometry column and uses cachedTransformers to implementto_crs. - Shapely has no CRS notion;
shapely.ops.transformtakes a pyproj transformer’stransformcallable to reproject geometry coordinate-by-vertex. - Fiona / pyogrio read a layer’s CRS as WKT/EPSG, which pyproj parses into a
CRS; on write, pyproj serializes aCRSback to the format the driver wants. - rasterio shares the same PROJ engine for raster CRS handling, so raster and vector reprojection stay consistent.
In short, pyproj is the shared “meaning of coordinates” authority that keeps the rest of the stack honest about what the numbers represent.
Technical limitations#
- It does not do geometry — pyproj transforms coordinates and computes geodesics, but it has no concept of polygons, overlay, or predicates; that is Shapely’s domain.
- Grid availability affects accuracy — high-precision datum shifts silently degrade to lower-accuracy paths if the required grids are absent and downloads are disabled (a real concern in offline/air-gapped environments).
- Axis-order footgun — forgetting
always_xyproduces swapped coordinates with some CRSs; correct by construction only if the developer is disciplined. - No vector I/O, no tabular model — it is a focused transformation/geodesy library and intentionally nothing more.
- PROJ version coupling — behavior and available transformations track the bundled PROJ version; results can shift subtly across PROJ upgrades as the EPSG database and pipeline selection evolve.
S2 — Technical Recommendation & Comparison#
How these libraries differ technically#
These five libraries are not competitors — they are layers of one stack, each owning a distinct technical concern. Understanding the geospatial-vector domain means understanding the separation:
- geojson is the only pure-Python, zero-dependency member. It models a single interchange format and computes nothing. Its architecture is “typed dicts + stdlib JSON.”
- Shapely is the geometry engine binding. All real geometric computation — predicates, overlay, buffering, measurement — happens here, in GEOS (C++). Its 2.x architecture stores GEOS pointers in NumPy object arrays and exposes operations as NumPy ufuncs, which is the performance foundation of the whole stack.
- pyproj is the coordinate-meaning authority, binding PROJ (C). It is the only member that understands CRSs, reprojection, and geodesy. Shapely deliberately delegates all CRS concerns to it.
- Fiona / pyogrio are the I/O boundary, binding OGR/GDAL (C/C++). Fiona streams features as dicts (bounded memory); pyogrio reads/writes whole layers vectorized (speed) and is GeoPandas’ default engine.
- GeoPandas is the orchestration layer. It has almost no geometry C code of
its own — it stitches Shapely + pyproj + pyogrio/Fiona onto a pandas
DataFrame, inheriting their correctness and performance.
The clean architectural rule: geojson models a format, Shapely computes
geometry, pyproj interprets coordinates, Fiona/pyogrio move data in and out, and
GeoPandas binds them into a table. The shared glue between the leaf libraries is
the __geo_interface__ protocol plus the common bundled C libraries
(GEOS/PROJ/GDAL).
Performance shape (qualitative)#
- Shapely 2.x — vectorized array ufuncs run loops in C; bulk operations are fast, scalar per-call operations carry Python-boundary overhead. STRtree turns pairwise work near-O(n log n).
- GeoPandas — inherits Shapely’s vectorization; spatial joins scale via the STRtree; I/O speed depends heavily on the engine (pyogrio fast, Fiona streaming).
- pyproj — “build a
Transformeronce, transform arrays many times”; build cost is high, per-array transform cost is low; remote datum grids add one-time latency. - pyogrio — batched whole-layer I/O, markedly faster than Fiona for bulk load/save; Fiona — bounded one-feature-at-a-time memory, ideal for files larger than RAM.
- geojson — pure-Python
json-speed; trivial for small/medium docs, slower than vectorized OGR for very large feature collections; zero install cost.
Architecture / feature comparison#
| Dimension | Shapely | GeoPandas | pyproj | Fiona / pyogrio | geojson |
|---|---|---|---|---|---|
| Native engine | GEOS (C++) | none (composes) | PROJ (C) | OGR/GDAL (C/C++) | none (pure Python) |
| Primary concern | geometry ops | tabular orchestration | CRS / transform / geodesy | vector file I/O | GeoJSON format model |
| Binding style | C-ext ufuncs (2.x) | pandas extension arrays | Cython | C bindings | pure Python |
| Data model | immutable geometries / geom arrays | GeoSeries / GeoDataFrame | CRS / Transformer / Geod | feature dicts (Fiona) / columnar (pyogrio) | dict-subclass typed objects |
| CRS aware | no | yes (per column) | yes (the authority) | reads/writes CRS metadata | fixed WGS84 (RFC 7946) |
| Vectorized | yes (array ufuncs) | yes (inherited) | yes (array transforms) | pyogrio yes / Fiona streams | no |
| Spatial index | STRtree | .sindex (STRtree) | n/a | n/a | n/a |
| Dependencies | GEOS (bundled) | heavy (whole stack) | PROJ (bundled) | GDAL (bundled) | none |
| Memory model | in-memory | in-memory (single proc) | per-call | Fiona bounded / pyogrio whole-layer | whole document |
| Version (mid-2026) | 2.1.2 (Sep 2025) | 1.1.3 (~Mar 2026) | 3.7.2 (Aug 2025) | Fiona 1.10.1 (Sep 2024) | 3.3.0 (May 2026) |
| License | BSD-3 (GEOS LGPL) | BSD-3 | MIT | BSD-3 | BSD-3 (Jazzband) |
Technical bottom line#
- The non-negotiable core of any serious Python vector workload is Shapely (geometry) + pyproj (CRS) + an OGR-based reader (pyogrio default, Fiona for streaming). GeoPandas sits on top when the work is tabular.
- Shapely 2.x vectorization is the most important architectural development in the stack — it is why GeoPandas is fast and why array-style code should be preferred over per-geometry Python loops.
- pyproj is the discipline layer: because Shapely is CRS-blind, correctness of any distance/area/overlay across datasets depends on reprojecting through pyproj first.
- geojson stands apart as the dependency-free format library — the right tool at web/API edges where pulling in GDAL is unjustified, and the wrong tool for computation or non-GeoJSON formats.
- Shared limitations to keep in view across the stack: planar, in-memory, single-machine. Larger-than-RAM or parallel work means Dask-GeoPandas or a PostGIS/GPU engine, and true Earth-surface metrics require pyproj geodesics rather than Shapely’s planar measurements.
Shapely — Technical Architecture#
Overview#
Shapely is the planar-geometry workhorse of the Python geospatial stack. It provides Python objects and operations for two-dimensional vector geometries — points, lines, polygons, and their multi-part collections — along with the predicates, set operations, and measurements defined by the OGC Simple Features model. It is not a coordinate-system library, not an I/O library, and not a geodesic library; it is a thin, well-shaped Python skin over a mature C++ computational-geometry engine.
Current release at the time of writing is v2.1.2 (September 2025), licensed BSD-3-Clause (the bundled engine is LGPL). The 2.x series was a substantial internal rewrite that reorganized Shapely around NumPy array semantics.
Native engine: GEOS#
Shapely wraps GEOS (Geometry Engine, Open Source), the C++ port of the Java Topology Suite (JTS). GEOS is the same engine PostGIS uses for its geometry operations, which is why a polygon intersection computed in Shapely and the same intersection computed in a PostGIS query produce identical results — they are literally the same code path. Binary wheels published on PyPI bundle GEOS (approximately 3.13.x), so a typical install pulls a vetted engine without the user compiling or system-installing GEOS.
GEOS owns essentially all the hard geometry: the noding and overlay algorithms, the buffer engine, the spatial predicates, the validity model, and the robustness handling. Shapely’s job is to marshal coordinate data into GEOS’s representation, call the right GEOS function, and marshal results back.
Binding mechanism (the 2.x rewrite)#
The defining architectural change in Shapely 2.0 was how it talks to GEOS:
- Shapely 1.x used
ctypesto call GEOS, with one Python object per geometry. Every operation crossed the Python/C boundary one geometry at a time, and the per-call overhead dominated when processing many small geometries. - Shapely 2.x is built on PyGEOS-style C extensions: GEOS geometry pointers are stored inside NumPy object arrays, and operations are exposed as NumPy universal functions (ufuncs). A single call can apply a GEOS operation across an entire array in C, looping at C speed rather than in the Python interpreter.
This is the single most important performance fact about modern Shapely, and it is what makes GeoPandas fast.
Core data model#
A Shapely geometry is an immutable Python object that holds an opaque pointer to a GEOS geometry. The hierarchy mirrors Simple Features:
Point,LineString,LinearRing,PolygonMultiPoint,MultiLineString,MultiPolygon,GeometryCollection
Geometries are planar and 2D by default (Z coordinates are carried but most operations are computed in the XY plane). Coordinates are floating-point and have no CRS — Shapely treats them as abstract Cartesian numbers. Whether those numbers are degrees of longitude or meters of UTM easting is the caller’s responsibility; this separation is deliberate and is the reason pyproj exists as a separate library.
Geometries are immutable: operations return new objects rather than mutating in place. This makes them hashable and safe to share, at the cost of allocation on every transform.
from shapely import Point, Polygon
poly = Polygon([(0, 0), (1, 0), (1, 1), (0, 1)])
poly.contains(Point(0.5, 0.5)) # True — a spatial predicate
poly.area # 1.0 — planar measurementArray-level model#
Alongside the scalar objects, Shapely 2.x exposes a functional, array-first API
in the top-level shapely namespace. These functions accept and return NumPy
arrays of geometries and are the ufunc fast path:
import shapely, numpy as np
pts = shapely.points(np.random.rand(1_000_000, 2)) # vectorized construction
areas = shapely.area(shapely.buffer(pts, 0.01)) # whole-array ops in CKey algorithms and techniques (provided by GEOS)#
- Spatial predicates —
contains,intersects,within,touches,crosses,overlaps,disjoint,covers, computed via the DE-9IM intersection matrix. - Set-theoretic overlay —
union,intersection,difference,symmetric_difference, built on GEOS’s noding/overlay (the modern OverlayNG in recent GEOS improves robustness). - Buffering — offset-curve construction with configurable join/cap styles and quadrant segmentation; one of the more expensive operations.
- Constructive geometry — convex hull, envelope, centroid, simplification (Douglas-Peucker), minimum rotated rectangle, Voronoi/Delaunay.
- Measurement — planar
area,length,distance(note: planar, not geodesic — distances are in the units of the coordinates). - Validity —
is_validplusmake_validto repair self-intersections and other OGC violations. - STRtree — a packed Sorted-Tile-Recursive R-tree for bulk spatial indexing and nearest-neighbor / candidate-pair queries.
Robustness#
GEOS performs overlay in floating-point and historically could throw
“TopologyException” on pathological near-degenerate inputs. Newer GEOS (the
3.13-era engine bundled in current wheels) uses OverlayNG with snap-rounding,
which makes these failures far rarer but does not make geometry arithmetic
exact. Practitioners still reach for make_valid and small buffers as defensive
tactics.
Performance characteristics#
- Vectorized array ops are the headline: applying a predicate or constructive operation across a large NumPy geometry array runs the loop in C, avoiding per-geometry Python overhead. This is where 2.x dramatically outperforms 1.x.
- Scalar object operations still incur Python object creation and a boundary crossing per call; in tight Python loops this overhead is real. Prefer the array functions for bulk work.
- STRtree turns O(n·m) pairwise tests into near-O(n log n) candidate generation followed by exact predicate checks on a small candidate set — the standard pattern for spatial joins.
- Buffer and overlay are computationally heavy relative to predicates; vertex-dense polygons cost proportionally more.
- Memory: geometries live in C; a NumPy object array of geometries stores pointers, so large collections are memory-efficient compared to one Python object per row in 1.x style.
API design patterns#
- Two coexisting styles: object methods (
poly.buffer(1.0)) for readability on single geometries, and module-level vectorized functions (shapely.buffer(arr, 1.0)) for performance on collections. New code that processes many geometries should prefer the latter. - Immutability and functional return — operations compose by chaining, each step yielding a fresh geometry.
__geo_interface__— Shapely geometries expose and consume the Python geo-interface protocol, the lingua franca that lets Shapely, GeoJSON, GeoPandas, and Fiona exchange geometries without a hard dependency on each other.
Composition with the stack#
- GeoPandas stores Shapely geometries in a
GeoSeriesand dispatches its spatial operations to Shapely’s array functions — GeoPandas is, geometrically, Shapely-over-pandas. - pyproj does the reprojection Shapely refuses to do;
shapely.ops.transformaccepts a pyproj transformer to move geometry between CRSs. - Fiona / pyogrio read features whose geometries are converted to Shapely objects via the geo-interface.
- geojson / GeoJSON I/O round-trips through
mapping()/shape()helpers.
Technical limitations#
- No CRS awareness — Shapely neither knows nor checks coordinate systems; mixing CRSs silently produces wrong answers.
- Planar only — measurements and predicates assume a flat plane. Computing
“distance” on raw lon/lat degrees yields meaningless units; geodesic work
belongs to pyproj’s
Geod. - 2D-centric — Z is preserved but not a first-class participant in most operations; there is no true 3D topology.
- Floating-point geometry — exactness is not guaranteed; degenerate inputs can still surprise even with OverlayNG.
- Single-machine, in-memory — no out-of-core or distributed execution; very large datasets must be tiled or pushed to a database/GPU engine.
- LGPL engine — the Shapely Python code is BSD-3, but the bundled GEOS is LGPL, which matters for some redistribution scenarios.
S3: Need-Driven
S3 Need-Driven Research Approach#
Methodology#
The S3 pass focuses on user personas and their needs rather than technical implementation. The question is not how to call shapely or geopandas, but who reaches for Python geographic libraries and why — what real-world job they are trying to finish, what hurts today, and what “done” looks like for them.
Geographic libraries are unusual because almost nobody wants them for their own sake. People want a map on a web page, a choropleth in a report, a clean GeoPackage handed to a colleague, or an address turned into a dot. The library stack is a means; the persona’s outcome is the end. S3 keeps that ordering.
Structure#
Each use-case file follows a consistent shape:
1. Who They Are#
The persona’s role, domain, and working context — and their relationship to code (some are programmers, some are domain experts who tolerate Python).
2. The Goal#
The concrete outcome they are paid (or motivated) to produce.
3. Constraints & Pain Points#
What gets in the way today: deployment limits, data volume, time, CRS confusion, rate limits, install friction, lack of a backend, etc.
4. Capabilities That Matter#
Which functional roles from the domain (geometry, format I/O, CRS, geocoding, EXIF, visualization) this persona actually needs — and which they can skip.
5. Which Libraries Serve Them#
A grounded mapping from the persona to a stack drawn from the domain set:
shapely, geopandas, fiona/pyogrio, geojson, pyproj, geopy,
folium, and the EXIF readers (exifread/Pillow). Includes what they can
leave out, since over-installing is a real cost.
6. What Success Looks Like#
The signal that the persona’s job is finished and the stack fit.
Design Principles#
- Problem-first — start from the pain, not the package.
- Honest scoping — name what each persona can omit (e.g. WGS84-only flows
skip
pyproj; GeoJSON-only flows skip GDAL-backed I/O). - No implementation — no code, no install commands; that is S1/S2 territory.
- Distinct personas — five non-overlapping roles, each with a different center of gravity (web map, analysis, ETL, app integration, field science).
- No overselling — a heavy stack is a liability for a light job.
Coverage Strategy#
The five personas span the domain’s real demand:
- Geotagged-photo web mapper — lightest end: photos → GeoJSON → Leaflet, no backend.
- Spatial data scientist — analysis at scale: joins, overlays, choropleths.
- GIS / ETL engineer — multi-format vector plumbing and streaming conversion.
- Web/desktop app developer — geocoding + rendering inside an application.
- Field researcher — coordinates from the field: distance, area, clustering, quick maps.
These are deliberately given equal weight — the photo mapper is as valid a customer of this domain as the data scientist, and the stacks differ sharply.
Relationship to Other Passes#
- S1 (Rapid) — shopping comparison of each library.
- S2 (Comprehensive) — architecture and engine internals (GEOS/GDAL/PROJ).
- S3 (this pass) — the “why bother” and “which stack for whom.”
- S4 (Strategic) — long-term viability and stack assembly guidance.
S3 Recommendation: Matching Stacks to Personas#
The Core Insight#
Nobody needs “geographic libraries” in the abstract — they need a specific outcome, and the right stack falls out of two sizing questions plus the persona’s goal:
- WGS84-only, or multiple CRS? GeoJSON is WGS84 (EPSG:4326, lon-lat) by
RFC 7946, and phone/GPS/geocoder output is WGS84. If everything stays in WGS84
and no distances/areas are measured, you can skip
pyprojentirely. The moment you measure (meters, m²) or mix coordinate systems,pyprojbecomes mandatory — skipping it is the most common silent error in spatial work. - Serialize, or analyze? Just reading/writing GeoJSON points → the
pure-Python
geojsonpackage. Joins, overlays, choropleths at scale →geopandas. Format-to-format conversion →pyogrio/fiona.
Over-installing is a real cost. The photo mapper dragging in geopandas, or the
store-locator app pulling the ETL stack, ships liability for capabilities it
never uses. Each persona below names what to leave out as deliberately as
what to include.
Persona → Stack Table#
| Persona | Core need | Recommended stack | Can skip |
|---|---|---|---|
| Geotagged-photo web mapper | Photos → GeoJSON → Leaflet map, no backend | exifread (or Pillow) + hand-rolled DMS→decimal + geojson + folium | pyproj, shapely, geopandas, fiona/pyogrio, geopy |
| Spatial data scientist | Joins, overlays, choropleths at scale | geopandas (core) + shapely + pyproj + pyogrio; folium for interactive maps | EXIF readers; geopy unless inputs are addresses |
| GIS / ETL engineer | Multi-format vector conversion, streaming | pyogrio (bulk) + fiona (streaming) + pyproj + shapely; geopandas/geojson as needed | folium, geopy, EXIF readers |
| Web/desktop app developer | Geocoding + render results | geopy (core) + folium (server-side) and/or geojson (JS front end); shapely if light geometry | geopandas, fiona/pyogrio, pyproj, EXIF readers |
| Field researcher | Distance, area, clustering, quick maps | shapely + pyproj (metric measurement) + geopandas (tabular) + folium; exifread if geotagged photos | fiona/pyogrio, geopy, geojson (unless GeoJSON handoff) |
Cross-Cutting Notes#
pyprojis the dividing line. Two personas can skip it (photo mapper, most app developers — both WGS84-only, no measurement); three cannot (data scientist, ETL engineer, field researcher). If a persona ever reports a distance or area,pyproj(or geopandas/shapely delegating to it) is required.geojsonvs. the GDAL stack. For GeoJSON-only serialization, the zero-dependencygeojsonpackage wins on footprint and install reliability. Reach forfiona/pyogrio/geopandasonly when multi-format I/O or analysis is the actual goal.pyogriooverfionaby default.pyogriois the modern, vectorized, preferred I/O engine and is much faster on large files. Preferfionaonly for its record-by-record streaming of very large files under a memory budget — and notefionahas had no release since Sept 2024 (dropped as a GeoPandas core dep).- Geocoding etiquette. Free Nominatim usage requires a descriptive custom
User-Agent and ~1 req/sec;
geopyprovides the provider abstraction and rate-limiting helpers.geopyis stable but last released Nov 2023. - Security. The only first-party CVE in the core stack is GeoPandas
CVE-2025-69662 (
to_postgisSQL injection, fixed 1.1.2+) — relevant to the data scientist and ETL engineer who push to PostGIS. - Install reality (2026). The historic GDAL/GEOS/PROJ build pain is largely
gone:
shapely,pyproj,fiona, andpyogrioship self-contained wheels, andgeojson/geopy/folium/exifreadare pure-Python — so even the heavier personas install cleanly in CI and containers.
The One-Line Heuristic per Persona#
- Photo mapper: stay tiny —
geojson+folium, read EXIF, never reproject. - Data scientist:
geopandasis your pandas; reproject before you measure. - ETL engineer:
pyogrioto move fast,fionato stream,pyprojto keep CRS honest. - App developer:
geopyto find it,folium/geojsonto show it — skip the GIS stack. - Field researcher: project to metric first, then
shapelyfor the numbers andfoliumfor the picture.
Use Case: Turning Geotagged Photos into a Web Map#
Who They Are#
A hobbyist photographer, a travel blogger, a small-museum volunteer, or a weekend hacker. They have a folder of JPEGs from a phone or GPS-enabled camera — a hike, a road trip, a survey of historic buildings — and they want a single web page where the photos appear as pins on an interactive map.
They are comfortable running a Python script someone showed them, but they are not running a server. The output has to be a static page they can drop on GitHub Pages, Netlify, or a USB stick. They have heard of “GeoJSON” because a tutorial mentioned it, and they have heard of Leaflet because OpenStreetMap maps on the web tend to use it. They do not know what a CRS is and should not have to.
The Goal#
Read the GPS coordinates already embedded in their photos, produce a small data file describing each photo’s location (and maybe a thumbnail or caption), and render that as clickable markers on an OpenStreetMap basemap — all without standing up a backend.
Success is a .geojson file plus an HTML page (or a self-contained HTML file)
that opens in any browser and shows their trip.
Constraints & Pain Points#
- No backend, no budget. Anything requiring a database, a tile server, or a paid API key is out of scope. The deliverable is static files.
- Install friction is fatal. This persona will abandon the project if the first dependency demands a C compiler or a system GDAL build. They need pip-installable wheels that “just work.”
- They don’t understand coordinate systems. Latitude/longitude is the only mental model they have. A library that asks “which CRS?” loses them.
- EXIF is fiddly. GPS in EXIF is stored as degrees/minutes/seconds rationals plus N/S/E/W reference letters — not as a friendly decimal number. Some photos have no GPS at all (screenshots, edited exports), and the code must skip those gracefully rather than crash.
- Small data. Tens to low hundreds of photos. Performance is irrelevant; reliability and simplicity are everything.
Capabilities That Matter#
| Capability | Needed? | Why |
|---|---|---|
| EXIF GPS extraction | Yes | The coordinates live inside the photos. |
| DMS → decimal conversion | Yes | EXIF gives DMS rationals; the map wants decimals. |
| GeoJSON writing | Yes | The portable, web-native output format. |
| Map visualization | Yes | The whole point is a viewable map. |
| Coordinate reference systems | No | GeoJSON is WGS84 by spec; phone GPS is WGS84. No reprojection. |
| Geometry engine (buffers, joins) | No | Points only; no spatial operations. |
| Multi-format I/O (Shapefile/GPKG) | No | GeoJSON is the only format in play. |
| Geocoding | No | Coordinates already exist; no address lookup. |
| Analysis at scale | No | Hundreds of points, not millions. |
The defining feature of this persona is how much they get to leave out.
Because GeoJSON is WGS84 (EPSG:4326, lon-lat) by RFC 7946 and phone/camera GPS
is already WGS84, there is no reprojection step — pyproj is unnecessary.
Because there are no spatial operations, shapely is optional. Because the only
format is GeoJSON, the GDAL-backed I/O stack (fiona/pyogrio/geopandas) is
pure overhead.
Which Libraries Serve Them#
A deliberately tiny stack:
- EXIF reading —
exifread(orPillow).exifreadis a clean, read-only EXIF extractor with no heavy dependencies, ideal for “just get me the GPS tags.” If the script is already opening images withPillow(for thumbnails or captions), reading EXIF GPS through Pillow avoids a second library — at the cost of pulling in a large image parser with a heavier CVE history. For a read-only “grab the coordinates” job,exifreadis the lighter, safer default. - DMS → decimal — hand-rolled. This is not a library decision. EXIF GPS comes out as three rationals (deg, min, sec) plus a reference letter; converting to a signed decimal degree is a few lines. No package is warranted.
- GeoJSON output —
geojson. Pure-Python, zero dependencies, models Features and FeatureCollections directly. For a WGS84-only, points-only job it is the perfect fit and the most reliable thing to install. - The map —
folium. Renders a Leaflet/OpenStreetMap map from Python and writes a self-contained HTML file. It can consume the GeoJSON layer and add popups with captions or thumbnails — exactly the static, no-backend deliverable this persona wants.
What they should not install: pyproj, shapely, geopandas,
fiona/pyogrio, geopy. Each is real weight for capabilities this job does
not use.
What Success Looks Like#
The photographer runs one script against a folder, gets a .geojson and an HTML
map, opens it locally, sees their trip as pins on OpenStreetMap, clicks a pin and
sees the photo’s caption — then uploads the HTML to a free static host and shares
the link. No server was provisioned, no API key was bought, and nothing needed a
compiler. Photos without GPS were silently skipped. The total dependency
footprint is two or three pure-Python packages.
When This Persona Applies#
- Source data is images with embedded GPS, not a spreadsheet or a database.
- The deliverable is a static web map, not an analysis or a dataset.
- Everything is WGS84 lat/lon and stays that way.
- Volume is small; correctness and zero-friction install beat performance.
- There is no backend and the budget for one is zero.
If any of these flip — addresses instead of coordinates, thousands of records needing joins, a non-WGS84 source, a Shapefile to ingest — the persona shifts toward one of the heavier profiles (web-app developer, data scientist, ETL engineer) and the stack grows accordingly.
Use Case: The GIS / Vector ETL Engineer#
Who They Are#
A data engineer or GIS developer whose job is moving vector data between systems and formats. They sit between data providers and data consumers: a government agency ships Shapefiles, a partner sends GML, the warehouse wants GeoPackage, the web team wants GeoJSON, and someone always wants a Shapefile back. They build pipelines — scheduled jobs, batch converters, ingest scripts — not interactive notebooks. Reliability, throughput, and fidelity matter more than visualization.
They are a serious programmer, comfortable with streaming, memory budgets, and error handling. They have likely fought GDAL builds before and have strong opinions about install reliability.
The Goal#
Convert and move vector datasets faithfully and at volume. Read a 5-million-row Shapefile, filter and transform it, write it as GeoPackage and GeoJSON. Normalize a stack of mixed-format inputs into one canonical format. Stream large datasets feature-by-feature so a multi-gigabyte file doesn’t have to fit in memory. Preserve attributes, geometry types, and the coordinate reference system through every hop without corruption.
Constraints & Pain Points#
- Format zoo. Shapefile (and its dBASE field-name and size quirks), GeoPackage, GeoJSON, GML, KML, and others — each with its own gotchas around field-name length, encoding, geometry-type strictness, and CRS metadata.
- Volume and memory. Files too large to load whole. The pipeline must stream records, not materialize them all in RAM.
- Fidelity. Attributes must survive (including encodings and field types), geometries must not be silently dropped or coerced, and the CRS must be carried and, when required, reprojected correctly. Shapefile’s column-name truncation and single-geometry-type limits are classic data-loss traps.
- Throughput. Conversions run on a schedule against many files; slow I/O multiplies across a batch.
- Install reliability in CI/containers. The pipeline runs in headless environments; dependencies must install from wheels without a system GDAL.
Capabilities That Matter#
| Capability | Needed? | Why |
|---|---|---|
| Multi-format vector read/write | Yes (core) | The entire job is format-to-format. |
| Streaming / record-by-record I/O | Yes | Files exceed memory; must not load whole. |
| CRS read + reprojection | Yes | Carry/convert coordinate systems across formats. |
| Attribute/schema fidelity | Yes | Field types, encodings, names must survive. |
| Geometry validation/repair | Often | Source data has invalid/self-intersecting geometries. |
| Dataframe analysis | Optional | Sometimes transform in bulk; not the core. |
| Visualization | No | Output is files, not maps. |
| Geocoding / EXIF | No | Inputs are vector datasets. |
Which Libraries Serve Them#
The center of gravity is format I/O:
pyogrio— the modern, vectorized GDAL/OGR reader-writer. It is fast, reads and writes the major OGR formats (Shapefile, GeoPackage, GeoJSON, GML, etc.), and is now the preferred engine in the ecosystem. For bulk read-transform-write, this is the default workhorse.fiona— the long-standing record-by-record OGR binding. Its iterator model is the natural fit for streaming very large files feature-by-feature under a tight memory budget — the one place it still has an edge over the vectorizedpyogrio. Caveat:fionahas had no release since September 2024 and was dropped as a GeoPandas core dependency in favor ofpyogrio, so treat it as mature-but-static and preferpyogriowhere streaming isn’t required.pyproj— CRS identification and reprojection. Essential for carrying coordinate systems faithfully across formats and reprojecting when targets demand it. This persona always needs it.shapely— geometry validation and repair. Source data routinely contains invalid or self-intersecting geometries;shapelychecks validity and fixes geometries so downstream formats accept them.geopandas— optional, for transforms that are easiest expressed as bulk dataframe operations (filter, reproject, dissolve) before writing back out. Pulls inpyogrioanyway.geojson— handy when the only target is GeoJSON and a zero-dependency, pure-Python writer keeps the container image lean.
What they leave out: folium (no maps), geopy (no geocoding), the EXIF
readers (no photos).
CVE note: the only first-party advisory in the core stack is GeoPandas CVE-2025-69662 (
to_postgisSQL injection, fixed 1.1.2+).fiona’s advisories are transitive via GDAL and already mitigated in current wheels — but its release staleness is the real thing to track.
Install reality (2026): the historic GDAL/GEOS/PROJ build pain is largely gone —
shapely,pyproj,fiona, andpyogrioall ship self-contained binary wheels, so the pipeline installs cleanly in CI and containers without a system GDAL.
What Success Looks Like#
A scheduled job ingests a multi-gigabyte Shapefile by streaming it, repairs a handful of invalid geometries, reprojects to the target CRS, and writes clean GeoPackage and GeoJSON outputs — with every attribute, encoding, and the CRS intact — inside the memory budget, from a container that pip-installed everything from wheels. The same pipeline runs nightly across dozens of files without manual intervention.
When This Persona Applies#
- The job is format conversion and movement, not analysis or display.
- Multiple vector formats are involved, with fidelity requirements.
- File sizes demand streaming or careful memory management.
- The code runs unattended in CI/containers, so install reliability is a first-class requirement.
- CRS must be carried and reprojected correctly across hops.
Use Case: The Field Researcher with Coordinates#
Who They Are#
A scientist or graduate student who collects data in the field with locations attached: an ecologist logging animal sightings or nest sites, an archaeologist recording artifact finds, an epidemiologist mapping case locations, a botanist sampling plots, a geologist marking outcrops, a social scientist surveying households. They carry a handheld GPS or a phone, come home with a spreadsheet of coordinates plus observations, and need to make sense of it.
They are domain experts first and programmers second. They tolerate Python (often via a notebook a collaborator set up) but their patience for tooling and build errors is low. They want answers and figures for a paper or a report, not a software project.
The Goal#
Take a table of field points (lat/lon plus attributes) and:
- Compute distances between points (how far apart are nest sites? what’s the spread of finds?).
- Compute areas (the extent of a sampled region, a home range, a survey polygon).
- Cluster points to find hotspots (disease clusters, aggregation sites).
- Visualize quickly — a map for a field report, a figure for a paper, a shareable interactive map for collaborators.
Success is correct distance/area numbers and a clear map, produced without a GIS-software learning curve.
Constraints & Pain Points#
- Measurement correctness is everything. Distances and areas computed on raw lat/lon degrees are wrong — degrees are not meters, and a degree of longitude shrinks toward the poles. The researcher needs metric results and may not realize the raw-degree trap exists. This is the single most common silent error in field-data analysis, and it’s a CRS/projection issue.
- Small-to-moderate data, but precision matters. Often hundreds to a few thousand points. Volume is not the problem; getting the numbers right and defensible for peer review is.
- Low tooling tolerance. A failed compile or a CRS prompt they don’t understand can stall the whole analysis. Clean wheels and sane defaults matter.
- Reproducibility for publication. The pipeline must be re-runnable and the method describable in a methods section.
- Quick, shareable visuals. Collaborators are remote; a static plot or an HTML map that can be emailed is far more useful than a desktop-GIS project file.
Capabilities That Matter#
| Capability | Needed? | Why |
|---|---|---|
| Distance / area computation | Yes (core) | The headline measurements. |
| CRS / reprojection to metric | Yes | Degrees aren’t meters; must project to measure. |
| Clustering / hotspot detection | Yes | Finding aggregations and hotspots. |
| Quick visualization | Yes | Figures and shareable maps. |
| Geometry primitives | Yes | Points, polygons, distances, buffers. |
| Dataframe handling | Often | Field data starts as a table. |
| Geocoding | Rarely | Coordinates collected directly; addresses uncommon. |
| EXIF GPS | Sometimes | If locations come from geotagged field photos. |
| Multi-format ETL | No | Not a data-plumbing job. |
Which Libraries Serve Them#
A modest, correctness-focused stack:
shapely— the geometry engine for distances, areas, buffers, and polygon construction from field points. The everyday tool for “how far,” “how big,” and “what’s the extent.”pyproj— the quiet hero here. To get distances and areas in meters, the researcher must reproject from WGS84 lat/lon into an appropriate metric CRS before measuring.pyprojdoes this (and offers geodesic distance on the ellipsoid directly). Skipping this step is the classic field-data error; including it is what makes the numbers publishable.geopandas— when the field data is a table of points,geopandaskeeps it in a dataframe, handles the reprojection in bulk, supports clustering workflows, and produces quick.plot()figures. The natural backbone when the data starts as a spreadsheet. It pulls inshapelyandpyprojanyway.folium— for an interactive, emailable map of the sites for collaborators or a field report, rendered straight from Python as self-contained HTML.exifread— only if locations were captured as geotagged field photos rather than logged coordinates; it extracts the EXIF GPS (DMS rationals needing a short DMS→decimal conversion) to recover the points. Overlaps with the photo-mapping persona at the ingestion step.
Clustering itself (e.g. density-based hotspot detection) typically comes from a general ML library rather than this geo stack; the geo libraries supply the projected coordinates and geometry that feed it, and
geopandasmakes wiring the two together straightforward.
What they can leave out: fiona/pyogrio (no multi-format ETL), geopy
(coordinates are collected, not geocoded), and the GeoJSON-writer geojson
unless they specifically need to hand off a GeoJSON file.
What Success Looks Like#
The researcher loads their field spreadsheet, reprojects the points to a metric
CRS, and reports inter-site distances and a sampled-area figure in meters and
square meters that survive peer review — then clusters the points to flag a
hotspot and emails collaborators an interactive folium map of the sites. The
analysis re-runs from the notebook for the revision, and the methods section can
honestly say the coordinates were projected before measurement.
When This Persona Applies#
- Data is field-collected coordinates (or geotagged field photos) plus observations.
- The needed outputs are distances, areas, clusters, and figures.
- Measurement correctness matters for publication — so reprojection to a metric CRS is mandatory, not optional.
- Volume is modest; the bottleneck is rigor and clear visuals, not scale.
- The work is analysis-for-a-paper, distinct from app features (web developer), bulk conversion (ETL engineer), or a pure web map (photo mapper).
Use Case: Spatial Data Science at Scale#
Who They Are#
A data analyst or data scientist who already lives in the pandas/NumPy world and now has a spatial dimension to their data. They work in notebooks, ship results as reports, dashboards, or model features, and think in terms of dataframes, joins, and aggregations. Their domains vary — public health, retail siting, logistics, real estate, epidemiology, transport, environmental analysis — but the shape of the work is the same: combine tabular records with geometries and ask spatial questions of them.
They are fluent programmers but not GIS specialists. They want spatial operations to feel like the dataframe operations they already know, and they want the answer, not a tour of computational geometry.
The Goal#
Answer questions that mix attributes and space: “Which census tracts fall inside each delivery zone, and what’s the total population served?” “Join these incident points to the neighborhood polygons they fall within and rate per capita.” “Clip this national dataset to one state and produce a choropleth.” The output is usually a table of results, a choropleth map for a report, or an enriched feature set feeding a model.
Constraints & Pain Points#
- Volume. Tens of thousands to millions of features. Point-by-point Python loops are too slow; they need vectorized, columnar operations.
- Heterogeneous CRS. Source layers arrive in different coordinate systems — one in WGS84, another in a national grid, a third in Web Mercator. Joining or measuring across them without aligning the CRS produces silently wrong answers. Area and distance in degrees are meaningless; they must reproject to a metric CRS before measuring.
- The join is the hard part. Spatial joins (point-in-polygon, polygon overlay/intersection, nearest) are the workhorse and the most error-prone step.
- Reproducibility. Results go in reports and decisions; the pipeline must be re-runnable in a notebook and produce the same numbers.
- They don’t want to leave pandas. Anything that forces a different mental model than dataframes adds friction.
Capabilities That Matter#
| Capability | Needed? | Why |
|---|---|---|
| Vectorized geometry on many rows | Yes | Spatial joins/overlays over large sets. |
| Spatial joins & overlays | Yes | The central operation of the work. |
| CRS handling / reprojection | Yes | Layers arrive in mixed systems; measurement needs metric CRS. |
| Choropleth / quick visualization | Yes | Reports and exploratory maps. |
| Multi-format input | Often | Sources may be Shapefile, GeoPackage, GeoJSON. |
| Geometry primitives | Yes (via dataframe) | Underlies joins/overlays; used through the dataframe, not by hand. |
| Geocoding | Sometimes | Only if inputs are addresses, not coordinates. |
| EXIF GPS | No | Source is datasets, not photos. |
Which Libraries Serve Them#
The center of gravity is geopandas — it is the spatial analog of pandas and
the natural home for this persona. It puts geometries in a dataframe column,
exposes spatial joins, overlays, dissolves, and aggregations as dataframe
operations, and integrates plotting for quick choropleths.
Around it:
geopandas— the analysis engine. Spatial joins, overlays, group-bys, dissolves, and.plot()choropleths over large datasets, all in the dataframe idiom the persona already knows.shapely— the geometry layer underneath. The persona rarely calls it directly, but it powers per-geometry operations and is there for the occasional custom buffer or predicate on a single shape.pyproj— the CRS authority. Reprojecting layers into a common, metric CRS before measuring or joining is non-negotiable here;geopandasdelegates this topyproj. Unlike the WGS84-only photo mapper, this persona genuinely needs it.pyogrio(withfionaas the older fallback) — fast reading of input layers.pyogriois now the preferred GeoPandas I/O engine and is dramatically faster thanfionaon large files;fionastill works but has not seen a release since 2024.folium— when an interactive choropleth or a shareable web map is the deliverable rather than a static notebook plot. Optional.
What they can usually leave out: the EXIF readers (no photos), and geopy
unless their inputs are addresses needing geocoding before analysis.
Watch item:
geopandascarries the only first-party CVE in the core stack — CVE-2025-69662, a SQL-injection into_postgis, fixed in 1.1.2+. Analysts pushing results to PostGIS should be on a patched version.
What Success Looks Like#
The analyst loads several layers in mixed CRS, reprojects them to a common metric system, runs a spatial join and a group-by, and gets a clean results table in seconds rather than minutes — then drops a choropleth into the report. The whole pipeline reads like pandas, re-runs deterministically in the notebook, and the population/area numbers are trustworthy because the CRS was handled explicitly.
When This Persona Applies#
- Data is tabular + geometric and lives in dataframes.
- The work is analysis (joins, overlays, aggregation), not serialization or format conversion.
- Volume rules out per-row Python loops.
- Multiple CRS are in play and measurements must be metric.
- The deliverable is a result, a model feature, or a choropleth — not a format-converted file (that’s the ETL engineer) and not an app feature (that’s the web developer).
Use Case: Adding Maps & Geocoding to an App#
Who They Are#
An application developer — backend, full-stack, or desktop — building a product that has a location feature bolted onto a larger system. The app might be a store locator, a field-service dispatch tool, a real-estate listing site, a delivery tracker, or an internal admin tool. Geography is a feature, not the whole product. They are not a GIS specialist and have no desire to become one; they want to take a user-entered address, turn it into coordinates, store it, and show it on a map.
They think in terms of requests, responses, models, and UI. They care about API behavior, rate limits, terms of service, and not shipping a flaky feature.
The Goal#
Two recurring jobs:
- Geocoding — convert a human address (“123 Main St, Springfield”) into latitude/longitude to store and search on, and sometimes the reverse: turn coordinates from a GPS-enabled client back into a human-readable address.
- Rendering — show one or many results on an interactive map: the located address, nearby stores, a route’s endpoints, the user’s saved places.
Success is a working location feature: the user types an address, the app pins it on a map, and nearby results appear — reliably, within the rate and licensing limits of whatever geocoding service is used.
Constraints & Pain Points#
- Addresses, not coordinates. Input is messy human text, not clean lat/lon. Geocoding is the gateway step and the one most likely to fail or rate-limit.
- Rate limits and terms of service. Free geocoders are governed. Nominatim (OpenStreetMap) allows roughly one request per second and requires a descriptive custom User-Agent; abusing it gets the app blocked. Commercial geocoders need keys and have quotas. The developer must pick a provider and respect its rules, or batch-geocode and cache.
- Provider portability. Today it’s Nominatim; tomorrow it’s Google, Mapbox, or a paid service. They want to swap providers without rewriting.
- Rendering target varies. A web app may hand GeoJSON to a JavaScript Leaflet front end; an internal tool or a quick admin view may want the map generated server-side and served as HTML.
- They don’t want a GIS stack. Heavy analysis libraries are overkill; the app needs lookup and display, not spatial joins.
Capabilities That Matter#
| Capability | Needed? | Why |
|---|---|---|
| Forward geocoding (address → coords) | Yes (core) | The primary job. |
| Reverse geocoding (coords → address) | Often | GPS clients send coordinates back. |
| Provider abstraction | Yes | Swap/compare geocoding services without rewrites. |
| Rate-limit handling | Yes | Free providers throttle; must throttle/cache. |
| Map rendering | Yes | Show results to the user. |
| GeoJSON serialization | Often | Feed a JS map front end or an API response. |
| CRS handling | Rarely | Geocoders return WGS84; GeoJSON is WGS84. Usually skippable. |
| Spatial analysis at scale | No | Lookup + display, not joins/overlays. |
| EXIF GPS | No | Input is addresses, not photos. |
Which Libraries Serve Them#
The center of gravity is geocoding plus light rendering:
geopy— the geocoding workhorse. It wraps many providers (Nominatim, Google, Mapbox, and others) behind one interface, giving the developer the provider portability they want and built-in rate-limiting helpers for well-behaved use of free services like Nominatim. Always set a descriptive custom User-Agent for Nominatim and stay within ~1 req/sec, or cache/batch. Notegeopyhas not had a release since November 2023 but remains widely used and functional.folium— server-side map rendering. When the app or an admin view needs a Leaflet/OpenStreetMap map produced from Python and served as HTML,foliumturns located points into an interactive map with popups. Ideal for desktop tools, internal dashboards, and server-rendered views.geojson— when the front end is JavaScript Leaflet (or the API returns features), this pure-Python, zero-dependency package builds the GeoJSON payload the client renders. Lightweight and a clean fit for an API boundary.shapely— optional, only if the app does light geometry (is this point inside this delivery zone? distance between two pins?). Most lookup-and-display apps don’t need it.
What they should leave out for a typical lookup-and-display feature:
geopandas, fiona/pyogrio, pyproj (geocoders and GeoJSON are both WGS84),
and the EXIF readers. Pulling in the analysis/ETL stack for a store locator is
pure liability.
What Success Looks Like#
A user enters an address, the app geocodes it through geopy (respecting the
provider’s rate limit and identifying itself properly), stores the coordinates,
and shows the result on a map — either a folium-rendered HTML view or a Leaflet
front end fed a geojson payload. Nearby saved places appear as additional pins.
The feature is reliable, the provider can be swapped by changing one config, and
no oversized GIS dependency was dragged into the app’s image.
When This Persona Applies#
- Location is a feature inside a larger app, not the product.
- Input is addresses (needs geocoding) or coordinates from a client (needs reverse geocoding).
- The deliverable is lookup + display, not analysis or bulk conversion.
- Everything stays in WGS84, so CRS handling is typically unnecessary.
- Provider rate limits and terms must be respected — a real operational constraint, not an afterthought.
S4: Strategic
S4 — Strategic Selection: Approach#
What This Phase Evaluates#
S1 and S2 answered what each library does and how it works. S4 answers a different question: if we adopt this library, will it still be a safe place to have built in five years? Features age; governance, funding, and ecosystem gravity are what determine whether a dependency quietly becomes a liability.
For each of the six geographic libraries we assess seven viability dimensions:
- Project governance & maintainer org — Who owns the repo? Is it a single hobbyist, an org, or a foundation-backed project with succession in place?
- Funding & sponsorship — Is there money or institutional backing (NumFOCUS, OSGeo, corporate sponsors), or is it pure volunteer time?
- Release cadence & recent activity — Are releases regular and recent, or has the project drifted into quiet maturity (or neglect)?
- Community size & ecosystem centrality — Stars and downloads are proxies, but the real question is how many other things break if this disappears.
- Bus-factor / maintenance risk — How many people understand the codebase? What happens if the lead maintainer steps away?
- License & lock-in — Permissive vs. copyleft, transitive native-library obligations (GEOS/GDAL/PROJ), and how much proprietary data lives in the API shape you’d have to rewrite.
- Migration story — If you had to leave, how realistic is the exit? What would you migrate to, and how much code is at stake?
How It Was Executed#
- Verified facts (release versions/dates, stars, monthly downloads, license, governing org, known CVEs) were taken from the S1/S2 fact base and treated as ground truth; no additional version numbers or metrics were invented.
- Each library was scored category-first — its standing in the whole geospatial-Python stack — rather than against a single task. A library can be a weak standalone choice yet a rock-solid foundation because everything else depends on it (this is exactly the shapely/pyproj story).
- We explicitly separated activity from health. A library with no 2025/2026 release is not automatically at risk; for a mature, narrow-scope library (geopy, fiona) low churn can be a maturity signal rather than a neglect signal. We flag the distinction rather than penalize quiet projects reflexively.
- Output: one viability file per library, then a consolidated
recommendation.mdwith a governance / funding / cadence / risk summary table and a migration appendix.
fiona / pyogrio — Strategic Viability#
fiona v1.10.1 (Sep 2024 — no 2025/2026 release) · ~1.2k stars · ~5.7M downloads/mo · BSD-3 · org: Toblerity · transitive GDAL CVEs mitigated pyogrio — the ascendant successor for vector I/O (now geopandas’ core engine)
One-Line Strategic Read#
These are the vector I/O layer — the Python bindings to GDAL/OGR that read and write shapefiles, GeoPackage, GeoJSON, and dozens of other formats. The strategic story here is a generational handoff: fiona is entering graceful maturity mode while pyogrio is the ascendant successor, already adopted by geopandas as its core I/O engine. Plan new work around pyogrio; keep fiona for what already uses it.
Why These Are Covered Together#
fiona and pyogrio are not competitors in the usual sense — they are two generations of the same capability (GDAL/OGR vector I/O exposed to Python). fiona is the incumbent, record-oriented binding; pyogrio is the newer, columnar, dramatically faster binding designed for the GeoDataFrame era. Evaluating one without the other would misread the trajectory, so this S4 assesses the transition.
Project Governance & Maintainer Org#
fiona lives under the Toblerity organization — the same well-established geospatial-Python community that houses shapely and rasterio. Governance is sound and mature; this is not an abandoned repo, it is a stable one. pyogrio emerged from the GeoPandas-adjacent community as a purpose-built modern I/O layer and has been promoted into the core stack by geopandas — the strongest possible governance endorsement, since the integration hub of the ecosystem chose to depend on it. That decision effectively anoints pyogrio as the strategic direction for vector I/O in Python.
Funding & Sponsorship#
Both follow the ecosystem-gravity model rather than carrying a named corporate sponsor: maintainer attention comes from contributors embedded in the geospatial-Python community. pyogrio’s adoption by geopandas (a NumFOCUS project) indirectly couples its sustainability to that better-funded project’s needs — a favorable signal for pyogrio’s future. fiona’s funding is in proportion to its declining role: enough to keep it stable, not enough (or not the intent) to keep adding features.
Release Cadence & Recent Activity#
This is the crux. fiona’s latest release is v1.10.1 (Sep 2024), with no 2025 or 2026 release — a clear move into maturity / slow-maintenance mode. Read carefully: this is not the same as abandonment. fiona is a mature wrapper over a mature C library (GDAL); it can be stable and useful with infrequent releases. But the direction of travel is unmistakable — feature energy has shifted to pyogrio, and geopandas dropping fiona as a core dependency removed the single biggest source of pull on fiona’s roadmap.
pyogrio, by contrast, is in active development and is the place new vector-I/O capability lands. For cadence-based viability scoring, pyogrio is “active / ascending” and fiona is “stable / slowing.”
Community Size & Ecosystem Centrality#
fiona’s ~1.2k stars and ~5.7M downloads/month still reflect a large installed
base — years of tutorials, scripts, and downstream tools were written against its
fiona.open() record API, and that legacy footprint will persist for a long time.
But its centrality is declining: the moment geopandas switched its default engine
to pyogrio, the single largest consumer stopped routing through fiona. pyogrio’s
centrality is the inverse — rising fast precisely because it now sits underneath
the ecosystem’s front door. Strategically, the flow of new dependence is moving
from fiona to pyogrio even as fiona’s stock of existing dependence remains large.
Bus-Factor / Maintenance Risk#
Moderate, and asymmetric. fiona in slow-maintenance mode carries the classic risk that a mature-but-quiet wrapper accrues: if GDAL ships a breaking change or a serious CVE, someone has to be motivated to cut a fiona release — and with energy elsewhere, that response could be slower than for an active project. pyogrio carries the opposite risk profile of any younger project (smaller historical contributor pool) offset by the strong tailwind of being geopandas’ chosen engine, which keeps capable maintainers engaged. Both ultimately inherit GDAL’s own maintenance health, which is robust (OSGeo-backed).
License & Lock-In#
BSD-3 (fiona; pyogrio likewise permissive), so no licensing friction. The strategic lock-in question is API shape, not data: data formats are GDAL’s open, universal set (shapefile, GPKG, GeoJSON, etc.) and are never trapped. The lock-in that matters is code written against fiona’s record-iterator API, which does not map one-to-one onto pyogrio’s columnar/DataFrame-oriented API. Teams with large fiona codebases face a real (if mechanical) porting cost.
Security note#
fiona’s risk surface is dominated by transitive GDAL CVEs, which are mitigated in the standard distribution (wheels track patched GDAL). The maturity-mode caveat above applies: keep an eye on whether security-relevant GDAL updates get reflected promptly in fiona releases, or plan to be on pyogrio where active maintenance makes that responsiveness more assured.
Migration Story#
The migration story here is unusually concrete because the ecosystem is already executing it: the realistic path is fiona → pyogrio, and geopandas has shown it works at scale. For most users who reach vector I/O through geopandas, the migration already happened transparently (the default engine switched underneath them). For code that calls fiona directly, migration means rewriting record-iterator logic into pyogrio’s columnar calls — straightforward but not free. Because the underlying formats are identical GDAL formats, there is zero data migration cost — this is purely an API port.
If you instead wanted to leave the fiona/pyogrio family entirely, the alternatives (GDAL’s own Python bindings, or pushing I/O into PostGIS) are heavier and rarely warranted; pyogrio is the modern answer.
Verdict#
- pyogrio: safe long-term bet — adopt as the default for new vector I/O. It is the ascendant, actively-developed, geopandas-blessed successor. New code should target it.
- fiona: watch item — stable, useful, but slowing. Keep it where it already works and where its mature record API is convenient; do not architect new systems around it. Treat the absent 2025/2026 release as a signal to plan a pyogrio path, and monitor GDAL-CVE responsiveness. There is no urgency to rip fiona out — its installed base is large and stable — but the strategic direction is unambiguously toward pyogrio.
folium — Strategic Viability#
v0.20.0 (Jun 2025) · ~7.4k stars · ~3.4M downloads/mo · MIT · org:
python-visualization/folium· no filed CVEs (but XSS-sink usage caveat)
One-Line Strategic Read#
folium is the default Python-to-Leaflet.js bridge — the most-starred library in this survey — that turns geospatial data into interactive browser maps without writing JavaScript. It is a safe, popular, actively-maintained presentation-layer choice; the one strategic asterisk is security hygiene around its HTML/JS output (an XSS-sink usage caveat, not a filed vulnerability).
Project Governance & Maintainer Org#
folium is governed by the python-visualization/folium organization — a
dedicated org rather than a personal repo, which is a healthier governance signal
than its visualization-wrapper category might suggest. It sits in a different
stratum from the geometry/CRS/I/O core: folium is a presentation layer, a thin
Python generator that emits Leaflet.js maps. Its governance health therefore has
two parts: the folium project itself (community-maintained, organizationally
homed) and its dependence on the upstream Leaflet.js ecosystem, which is large,
mature, and independently sustained. Strategically, folium rides Leaflet’s
gravity, which is substantial.
Funding & Sponsorship#
Community-maintained, no named corporate sponsor — typical for a visualization wrapper. The funding question is lower-stakes here than for the core libraries: folium’s job is to template HTML/JS around Leaflet, so its maintenance surface is modest, and the heavy lifting (the actual map rendering, tiles, interaction) is done by Leaflet and tile providers that have their own robust ecosystems. The diffuse community investment folium receives is adequate for a stable wrapper.
Release Cadence & Recent Activity#
Active and current: v0.20.0 (Jun 2025), with an ongoing 0.x release rhythm. Worth noting strategically: folium remains pre-1.0 even after many years and a large user base. In this case the 0.x version number reflects historical versioning conventions and a wrapper’s willingness to evolve its API, not immaturity — adoption and stability are well beyond what “0.20” implies. The cadence is healthy: regular releases, responsive to Leaflet changes and Python ecosystem shifts. No drift.
Community Size & Ecosystem Centrality#
folium is the most-starred library in the entire survey at ~7.4k stars — a reflection of how visible and approachable interactive mapping is (it’s the “wow, a map appeared in my notebook” library, which drives stars). Its ~3.4M downloads/month is the lowest in the survey, however, and that contrast is the key strategic insight: folium is high-visibility, narrower- footprint. It is widely known and beloved for demos, notebooks, dashboards, and reports, but it is a leaf in the dependency tree — almost nothing depends on folium transitively, because it is an end-of-pipeline presentation choice, not infrastructure. That means its failure mode is contained: if folium stalled, only visualization code would need rework, never your data or analysis layer.
Bus-Factor / Maintenance Risk#
Low-to-moderate. As a community wrapper, folium depends on a relatively small maintainer group, but the risk is well-contained by its position: a presentation wrapper is comparatively simple to maintain or, if necessary, replace. It also inherits the stability of Leaflet.js (a very mature, widely-deployed JS library), which insulates folium from needing constant deep rework. The combination of a dedicated org, an active 2025 release, and a simple, replaceable role keeps overall maintenance risk modest.
License & Lock-In#
MIT — permissive, no friction. Lock-in is low: folium consumes standard inputs (GeoJSON, coordinate data, tile URLs) and emits standard outputs (HTML + Leaflet.js). Nothing about your data is captured by folium; only your map-rendering code is folium-shaped, and that is a thin, replaceable layer.
Security note (usage caveat — action required)#
folium has no filed CVEs, but carries a real XSS-sink usage caveat: because folium generates HTML/JavaScript and lets you inject content (popups, tooltips, custom HTML) into the rendered map, unsanitized user-supplied data flowed into folium output can become a cross-site-scripting vector. This is a usage responsibility, not a library defect — but it is strategically important for any deployment that renders folium maps containing untrusted input (user-generated labels, externally-sourced attributes). The mitigation is standard: sanitize/escape any untrusted content before it reaches folium’s HTML-emitting APIs, and treat folium output as untrusted HTML when embedding it in a larger application.
Migration Story#
folium’s migration story is the easiest in the survey, by virtue of its leaf position. Because it consumes open formats (GeoJSON, coordinates) and produces a self-contained Leaflet artifact, swapping folium for an alternative presentation layer — a different Python mapping wrapper, direct Leaflet/Mapbox GL JS, or a plotting library’s map mode — touches only your visualization code, never your data or analysis. There is zero data lock-in and the replacement surface is small and well-bounded. You can adopt folium for visualization with confidence that, if needs change, exiting is cheap and localized.
Verdict#
Safe long-term bet for the presentation layer — with a standing security practice. folium is the most-starred, actively-maintained (v0.20.0, Jun 2025), MIT-licensed default for interactive Leaflet maps in Python, and its leaf-node position makes both its failure mode and its migration cost minimal. The one non-optional discipline is sanitizing untrusted data before it reaches folium’s HTML/JS output (the XSS-sink caveat). Adopt it freely for maps, dashboards, and reports; treat its output as untrusted HTML when untrusted input is involved.
geopandas — Strategic Viability#
v1.1.3 (~Mar 2026) · ~5.2k stars · ~16M downloads/mo · BSD-3 · org: GeoPandas (NumFOCUS fiscally sponsored) · CVE-2025-69662 (to_postgis SQL injection) fixed in 1.1.2+
One-Line Strategic Read#
geopandas is the integration hub of the geospatial-Python stack — the pandas-shaped surface that ties shapely, pyproj, and pyogrio/fiona together — and it is the only library in this survey with formal foundation backing (NumFOCUS). It is a safe long-term bet, with the single caveat of a recently-patched CVE that makes version hygiene non-optional.
Project Governance & Maintainer Org#
geopandas is governed by the GeoPandas organization and is fiscally sponsored by NumFOCUS. This is the strongest governance posture in the survey. NumFOCUS sponsorship means the project has a legal/financial home, a recognized governance model, and access to the same institutional support structure that backs NumPy, pandas, Matplotlib, and the rest of the scientific-Python core. For an organization making a multi-year bet, “this is a NumFOCUS project” is a materially stronger assurance than “this is a popular repo.”
Crucially, geopandas reached v1.0+ and is now on v1.1.x — a 1.x project under semantic-versioning discipline signals API stability commitments, which is exactly what downstream adopters want from an integration layer.
Funding & Sponsorship#
NumFOCUS fiscal sponsorship provides the funding/governance umbrella (donations, grants, fiscal hosting). Beyond that, geopandas benefits from the same diffuse ecosystem funding as the rest of the stack: many contributors are employed in roles where maintaining the geospatial-Python toolchain is part of the job. The combination of an explicit fiscal sponsor and broad employer-adjacent contribution is the most robust funding picture among the six libraries.
Release Cadence & Recent Activity#
Very active, and notably current: v1.1.3 lands around Mar 2026, making geopandas one of the freshest releases in the survey. The cadence through the 1.x line has been healthy — regular minors with features, prompt patch releases for fixes (the CVE patch shipped in 1.1.2). This is a project in active feature development, not maintenance mode.
A strategically important recent decision: geopandas dropped fiona as a core dependency in favor of pyogrio for vector I/O. This is a sign of active, opinionated stewardship — the maintainers are willing to re-platform the I/O layer for performance and a cleaner architecture rather than coast on legacy choices. (It also has downstream implications for fiona’s trajectory; see that file.)
Community Size & Ecosystem Centrality#
At ~5.2k stars geopandas is the most-starred core library here (folium is higher but is a visualization wrapper), and ~16M downloads/month places it firmly in the top tier. Its centrality is qualitative as much as quantitative: geopandas is where most users enter the stack. They rarely call shapely or pyproj directly — they call geopandas, which orchestrates them. That makes geopandas the ecosystem’s “front door” and gives it enormous gravity: tutorials, courses, Stack Overflow answers, and downstream tools (e.g., dask-geopandas, movingpandas, plotting libraries) all assume it.
Bus-Factor / Maintenance Risk#
Low. geopandas has a multi-maintainer team with overlap into the wider Toblerity/scientific-Python communities, and NumFOCUS provides organizational continuity independent of any one person. The residual risk is that geopandas depends on a chain of native libraries (GEOS via shapely, PROJ via pyproj, GDAL via pyogrio) — so its health is partly a function of their health. That dependency-chain exposure is inherent to being an integration hub and is mitigated by the fact that all of those dependencies are themselves well-maintained survey entries.
License & Lock-In#
BSD-3 — permissive, no commercial friction. Transitive native-library obligations (LGPL GEOS, GDAL/PROJ) flow through its dependencies and are the normal, well-understood situation rather than anything geopandas-specific.
Lock-in is moderate but benign: because geopandas standardizes on the
GeoDataFrame abstraction, code written against it is shaped by that API. But the
abstraction is thin and pandas-aligned, the data formats underneath are open
standards (GeoParquet, GeoJSON, GPKG, etc.), and the geometries are plain shapely
objects — so your data is never trapped even if your code is geopandas-shaped.
Security note (action required)#
CVE-2025-69662 — a SQL-injection vulnerability in to_postgis — was fixed in
1.1.2+. Any deployment that writes to PostGIS from geopandas must be on 1.1.2
or later. This is the only filed CVE among the six libraries and is the concrete
reason to treat geopandas version-pinning and upgrade discipline as a standing
requirement rather than a nicety.
Migration Story#
Leaving geopandas means leaving the integration pattern, not the data. The realistic alternatives are (a) drop down to using shapely + pyproj + pyogrio directly (more code, but the same underlying objects — a smooth, partial exit), or (b) move geometry into a database (PostGIS) and treat Python as a thin client. Because geopandas reads/writes open formats and hands back shapely geometries, there is no data lock-in — a migration is a code-refactoring exercise, not a data-rescue exercise. In practice almost no one migrates away; the gravitational pull is toward geopandas, not away from it.
Verdict#
Safe long-term bet — foundational, with mandatory version hygiene. geopandas has the best governance story in the survey (NumFOCUS), the freshest release (1.1.3, ~Mar 2026), top-tier adoption, and benign lock-in. The one non-negotiable is staying current: CVE-2025-69662 makes “pin and forget” an unacceptable posture. Adopt it as the default geospatial DataFrame layer; keep it patched.
geopy — Strategic Viability#
v2.4.1 (Nov 2023 — no release in ~2.5y, repo active into 2026) · ~4.8k stars · ~15M downloads/mo · MIT · org:
geopy/geopy· no known CVEs
One-Line Strategic Read#
geopy is the provider-agnostic geocoding façade for Python — a single, uniform API in front of 30+ geocoding/reverse-geocoding services (Nominatim, Google, Bing, Mapbox, and many more). Its standout strategic feature is exactly its lack of churn: it is a mature, stable abstraction layer whose value is insulating your code from the messy, ever-changing world of geocoding vendors.
Project Governance & Maintainer Org#
geopy is governed by the geopy/geopy organization. It is a long-standing,
community-maintained project (one of the older libraries in the Python geo space)
with an established codebase and a clear, narrow mission. It is not foundation-
backed (no NumFOCUS/OSGeo umbrella), nor does it need to be: its scope is
deliberately bounded — wrap third-party geocoders behind a common interface — and
that scope has been essentially complete for years. Governance here looks like
careful stewardship of a finished design rather than active feature expansion.
Funding & Sponsorship#
No named corporate sponsor; geopy is volunteer/community-maintained. This would be a concern for a fast-moving library, but for a stable abstraction layer the funding question is softened: the expensive parts (geocoding, geodata, map serving) live in the third-party services geopy wraps, not in geopy itself. geopy’s maintenance cost is low — it mostly tracks provider API changes — so the modest, diffuse community investment it receives is well-matched to its actual upkeep needs.
Release Cadence & Recent Activity#
This is the dimension most likely to be misread, so it deserves care. geopy’s last release is v2.4.1 (Nov 2023) — roughly 2.5 years with no release as of this survey. Taken alone that looks alarming. But two facts reframe it: (1) the repository remains active into 2026 (issues, discussion, maintenance work), and (2) geopy is a mature, low-churn library by nature — a stable façade over external services. A geocoding-abstraction layer does not need frequent releases; it needs to keep working and to absorb the occasional provider change.
The honest strategic reading: this is maturity mode, not abandonment — but it is the kind of maturity that warrants a watch flag, because the line between “stable and finished” and “quietly neglected” is thin, and a 2.5-year release gap sits right on it. The mitigating evidence (active repo into 2026) currently lands it on the “stable” side.
Community Size & Ecosystem Centrality#
Strong: ~4.8k stars (second-highest among the core libraries, behind only the visualization-oriented folium) and ~15M downloads/month — comparable to geopandas. geopy’s centrality is in a different lane from the shapely/pyproj/ geopandas geometry-and-CRS core: it is the default answer to “how do I turn an address into coordinates in Python.” That is a distinct, ubiquitous need, and geopy effectively owns it — there is no comparably broad, provider-agnostic competitor with anything like its adoption. That ownership gives it real gravitational durability even through quiet release periods.
Bus-Factor / Maintenance Risk#
Moderate — and the most worth-watching of the foundational-tier libraries. As a community project without foundation backing or a corporate sponsor, geopy’s continuity depends on a relatively small maintainer group, and the long release gap suggests reduced maintainer bandwidth. The risk is not that geopy stops working (a stable façade keeps working) but that it responds slowly if several underlying provider APIs change at once, or if a security issue surfaces. The mitigating factor is the low intrinsic maintenance burden: provider-tracking is incremental, and the community footprint (4.8k stars) is large enough that a serious gap would likely attract contributors or a fork.
License & Lock-In#
MIT — fully permissive, no friction. Lock-in is, by design, minimal and even inverted: geopy’s entire purpose is to be an anti-lock-in layer that lets you swap geocoding providers without rewriting your code. You are coupled to geopy’s small, uniform API, but that coupling buys you freedom from coupling to any single geocoding vendor — a net reduction in strategic risk. Note that the underlying providers carry their own terms, rate limits, and pricing; geopy abstracts the code, not the contractual relationships.
Migration Story#
The migration story is benign in both directions. Because geopy’s API is small and its job is narrow, replacing it (e.g., calling a geocoder’s SDK directly, or swapping to another abstraction) is a contained refactor touching only your geocoding call sites — not a deep architectural change. And because geopy already decouples you from any specific provider, the much more common and painful migration — switching geocoding vendors — is the one geopy is purpose-built to make trivial. Strategically, geopy reduces your worst migration risk (vendor lock-in) while keeping its own replaceability cost low. That is an excellent risk profile.
Verdict#
Safe to adopt, but a watch item on cadence. geopy owns the provider-agnostic geocoding niche, is MIT-licensed, widely adopted (~15M dl/mo), and structurally anti-lock-in. Its 2.5-year release gap is best read as mature low-churn (the repo is active into 2026), not abandonment — but it is the one foundational-tier library here where maintenance bandwidth genuinely bears watching. Recommended posture: use it confidently for geocoding abstraction, and periodically confirm the repo stays active and your specific providers remain supported. Its low replacement cost means even a worst-case decline would be a manageable, localized migration.
pyproj — Strategic Viability#
v3.7.2 (Aug 2025) · ~1.2k stars · ~21M downloads/mo · MIT · org:
pyproj4(PROJ / OSGeo ecosystem) · no known CVEs
One-Line Strategic Read#
pyproj is the canonical CRS and coordinate-transformation engine for Python — the Python interface to PROJ, the same C library that powers QGIS, GDAL, and essentially every serious GIS. It is a foundational, low-risk, OSGeo-anchored dependency: quietly enormous (~21M downloads/month) and effectively irreplaceable.
Project Governance & Maintainer Org#
pyproj is governed by the pyproj4 organization within the PROJ / OSGeo
ecosystem. This is a strategically important pedigree. PROJ is an OSGeo
Foundation project — the foundational governance body for open-source geospatial
software (it also stewards GDAL, GEOS, QGIS, and more). pyproj being the official
Python binding into that OSGeo-governed core means its long-term direction is tied
to one of the most stable, institution-backed bodies in the entire geospatial
world. CRS handling is not something a hobbyist re-implements; it requires the
authoritative datum/transformation database that PROJ maintains, and pyproj is the
sanctioned Python door to it.
Funding & Sponsorship#
pyproj inherits the sustainability of the OSGeo/PROJ ecosystem. PROJ itself receives institutional and grant funding (geodetic agencies, OSGeo sponsorship, and contributors employed by GIS vendors and government mapping bodies have direct incentive to keep the CRS engine accurate and current). pyproj’s funding is the binding-layer reflection of that: maintained by people for whom correct coordinate transformation in Python is professionally load-bearing. This is a durable, mission-critical funding posture rather than discretionary volunteer time.
Release Cadence & Recent Activity#
Active and current. The latest release is v3.7.2 (Aug 2025), with the 3.7.x line tracking PROJ’s own releases. This coupling is a feature: when PROJ updates its datum grids or adds transformations, pyproj follows, so the cadence is naturally healthy and tied to an actively-developed upstream. No signs of drift — pyproj ships regularly and stays aligned with the authoritative C library.
Community Size & Ecosystem Centrality#
Star count (~1.2k) badly understates pyproj’s importance; downloads (~21M/month) tell the real story — second only to shapely among the core libraries in this survey, and ahead of geopandas. The reason is the same as shapely’s: pyproj is pulled in transitively by nearly everything that reprojects coordinates, including geopandas. Almost no one’s first thought is “I’ll add pyproj,” yet it ends up in the dependency tree of any project that touches CRS — which is most geospatial projects. Its centrality is structural: there is essentially one correct way to do CRS transformations in Python, and this is it.
Bus-Factor / Maintenance Risk#
Low at the ecosystem level, with the usual narrow-expertise caveat. The number of people who deeply understand PROJ internals and the C/Python binding layer is not large — CRS/geodesy is specialist knowledge. But the OSGeo governance umbrella and the professional stakes (national mapping agencies and GIS vendors need this to keep working) mean continuity does not rest on any single volunteer. The residual risk is the generic geospatial-stack concern of a finite C-extension contributor pool, mitigated here by the strongest institutional backing in the survey.
License & Lock-In#
MIT — the most permissive license in the survey, with no copyleft friction at the binding layer. (PROJ itself is MIT/X-style as well, so unlike the shapely/GEOS-LGPL case, the transitive obligations here are minimal.) This makes pyproj exceptionally clean for commercial and embedded use.
Lock-in is conceptual rather than practical: your code expresses CRS via pyproj’s
CRS/Transformer objects, but those are built on open standards (EPSG codes,
WKT, PROJ strings) that are portable across every GIS tool. You are locked into
the standards, which is exactly where you want to be — not into pyproj as a
vendor.
Migration Story#
There is no meaningful migration target, which is the strongest possible viability statement. Any serious coordinate transformation in any language ultimately calls PROJ; “leaving pyproj” in Python means either calling PROJ through a different/lower-level binding (more work, same engine) or pushing reprojection into PostGIS/GDAL (which also use PROJ). Because CRS definitions live as portable EPSG/WKT identifiers, there is zero data lock-in — but there is also no reason to leave, because every alternative is the same PROJ engine wearing a different hat. You build on pyproj; you do not migrate off it.
Verdict#
Safe long-term bet — foundational. pyproj is among the lowest-risk dependencies available: OSGeo-governed, MIT-licensed, actively released (v3.7.2, Aug 2025), structurally central (~21M downloads/month), and with no realistic migration pressure because every alternative is the same underlying PROJ engine. Adopt without reservation; the only standing concern is the ecosystem-wide specialist-contributor pool, which OSGeo backing substantially de-risks.
S4 — Strategic Recommendation: Geographic Libraries (Python)#
Strategic Verdict (the short version)#
The Python geospatial stack is unusually layered and cohesive, and that shapes the strategic answer. Most of these libraries are not rivals — they are tiers that stack on top of one another. The right framing is not “which one wins” but “which tier is each, and how safe is the bet at that tier.”
- Foundation (build on without reservation): shapely (geometry), pyproj (CRS), geopandas (the integration hub). These three are the lowest-risk dependencies in the survey — central, actively maintained, and with no realistic migration target. You build toward them, not away from them.
- Foundation, but maintain version hygiene: geopandas carries the survey’s
only filed CVE (CVE-2025-69662,
to_postgisSQL injection, fixed 1.1.2+). Adopt it — but “pin and forget” is not acceptable. - Adopt the successor for new work: for vector I/O, pyogrio is the ascendant engine (already geopandas’ core); fiona is the slowing incumbent.
- Watch items (safe today, monitor): fiona (no 2025/2026 release — maturity/slowing), geopy (no release in ~2.5y but repo active into 2026 — mature low-churn). Neither is a reason to avoid; both warrant periodic re-check.
- Safe presentation-layer bet, with a security practice: folium — adopt freely for maps; sanitize untrusted data before it reaches its HTML/JS output (XSS-sink caveat).
Viability Summary Table#
| Library | Governance / Org | Funding | Cadence (latest) | Adoption | Risk |
|---|---|---|---|---|---|
| shapely | shapely/shapely (Toblerity ecosystem) | Ecosystem gravity | Very active — v2.1.2 (Sep 2025) | ~4.5k★ / ~67M dl/mo | Low (foundational) |
| pyproj | pyproj4 (PROJ / OSGeo) | OSGeo + grant/agency | Active — v3.7.2 (Aug 2025) | ~1.2k★ / ~21M dl/mo | Low (foundational) |
| geopandas | GeoPandas (NumFOCUS) | NumFOCUS fiscal sponsor | Very active — v1.1.3 (~Mar 2026) | ~5.2k★ / ~16M dl/mo | Low + CVE patch req’d |
| pyogrio | GeoPandas-adjacent (core engine) | Coupled to geopandas | Active (ascendant) | (rising) | Low-Mod (younger, strong tailwind) |
| fiona | Toblerity | Ecosystem gravity | Slowing — v1.10.1 (Sep 2024) | ~1.2k★ / ~5.7M dl/mo | Moderate (maturity mode) |
| geopy | geopy/geopy (community) | Volunteer/community | Quiet — v2.4.1 (Nov 2023), repo active to 2026 | ~4.8k★ / ~15M dl/mo | Moderate (cadence watch) |
| folium | python-visualization/folium | Community | Active — v0.20.0 (Jun 2025) | ~7.4k★ / ~3.4M dl/mo | Low-Mod + XSS usage caveat |
★ = GitHub stars · dl/mo = PyPI downloads per month. License: shapely BSD-3 (bundled GEOS LGPL); geopandas BSD-3; fiona BSD-3; pyproj MIT; geopy MIT; folium MIT.
Safe Long-Term Bets#
shapely, pyproj, geopandas form the spine of the ecosystem. Two are quietly enormous on downloads (shapely ~67M/mo, pyproj ~21M/mo) because everything pulls them in transitively; geopandas is the front door most users enter through and is the only survey entry with formal foundation backing (NumFOCUS). All three are actively released through 2025–2026, permissively or benignly licensed, and have no realistic migration target — the strongest possible viability signal. Invest team skill here first.
pyogrio joins this tier for new vector I/O work. Its adoption as geopandas' core engine is effectively an ecosystem-level endorsement and the clearest signal of where I/O is heading.
Watch Items#
fiona — Healthy but slowing. The absent 2025/2026 release and geopandas dropping it as a core dependency mark a generational handoff to pyogrio. It remains stable and has a large installed base; there is no urgency to remove it. Action: target pyogrio for new code, monitor whether security-relevant GDAL updates keep flowing into fiona releases.
geopy — Owns the provider-agnostic geocoding niche with strong adoption (~15M dl/mo), but a ~2.5-year release gap puts it on the line between “mature and finished” and “quietly neglected.” The active-into-2026 repository currently lands it on the safe side. Action: use it confidently (it is itself an anti-lock-in layer), and periodically confirm repo activity and that your specific geocoding providers remain supported. Its low replacement cost makes any future decline a contained, localized migration.
folium — Strategically safe as a presentation layer (most-starred in the survey, active release, leaf-node position = contained failure mode). The standing requirement is security hygiene: it generates HTML/JS, so sanitize/escape any untrusted input before it reaches folium’s output, and treat that output as untrusted HTML when embedding it.
Migration Considerations#
The survey’s headline migration finding is reassuring: there is essentially no data lock-in anywhere in this stack. Every library reads and writes open, standardized formats and exchanges plain shapely geometries and EPSG/WKT CRS identifiers. Migration costs, where they exist, are code-shaped, not data-shaped.
- shapely / pyproj: No migration target exists or is needed — every alternative is the same underlying GEOS/PROJ engine via a different binding. These are pure “build on it” decisions.
- geopandas: Exiting means dropping to shapely + pyproj + pyogrio directly (more code, identical objects) or moving geometry into PostGIS — a smooth, partial, refactor-only path. No one is forced to migrate; gravity pulls inward.
- fiona → pyogrio: The one migration the ecosystem is actively executing. For users who reach I/O through geopandas it already happened transparently; direct fiona callers face a mechanical record-API → columnar-API port with zero data cost.
- geopy: Replacement is a contained refactor of geocoding call sites — and geopy’s whole purpose is to make the more painful migration (switching geocoding vendors) trivial. Net risk reducer.
- folium: Cheapest exit in the survey — swap the presentation layer, touch no data or analysis code.
Bottom Line#
Adopt shapely + pyproj + geopandas + pyogrio as the core stack with high confidence; keep geopandas patched (CVE-2025-69662). Use geopy for geocoding and folium for interactive maps, treating geopy’s cadence and folium’s XSS-sink as the two standing watch items. Let fiona age out gracefully toward pyogrio rather than ripping it out. The stack as a whole is low-lock-in, foundation/ecosystem-backed, and actively maintained — one of the safer long-term bets in open-source Python. Re-evaluate the two watch items (fiona, geopy) annually.
shapely — Strategic Viability#
v2.1.2 (Sep 2025) · ~4.5k stars · ~67M downloads/mo · BSD-3 (bundled GEOS is LGPL) · org:
shapely/shapely(Toblerity / GeoPandas ecosystem) · no known CVEs
One-Line Strategic Read#
shapely is the load-bearing foundation of geometry in Python — the most downloaded library in this survey by a wide margin, a hard dependency of geopandas, and the canonical Python binding to GEOS. It is as safe a long-term bet as exists in the geospatial stack.
Project Governance & Maintainer Org#
shapely lives under the shapely/shapely GitHub organization and sits inside the
broader Toblerity / GeoPandas ecosystem — the same constellation of projects that
includes fiona, rasterio, and (by close association) geopandas and pyproj. This
matters strategically: shapely is not an orphan repo maintained by one person who
happened to wrap GEOS. It is governed as part of a recognized, multi-project
geospatial-Python community with overlapping maintainers, shared release
conventions, and a clear identity.
The v2.x line (the current generation) was a deliberate, multi-year rearchitecture that moved shapely onto a vectorized, NumPy-backed core. That kind of large, coordinated rewrite is itself a governance signal: it required sustained maintainer attention and ecosystem coordination (geopandas, in particular, had to adapt), and it was executed cleanly. Projects that can land a 1.x → 2.x transition without fragmenting the community are projects with functioning governance.
Funding & Sponsorship#
shapely does not advertise a single corporate sponsor in the way some libraries do, and it is not, on its own, a NumFOCUS fiscally-sponsored project the way geopandas is. Its effective funding model is ecosystem gravity: because geopandas, fiona, rasterio, and a long tail of downstream tools depend on it, maintainer attention flows to it from people who are paid (directly or indirectly) to keep the geospatial-Python stack working. This is a more diffuse but, in practice, very durable form of sustainability — the project’s survival is coupled to the survival of the entire ecosystem above it.
Release Cadence & Recent Activity#
Very active. The current release is v2.1.2 (Sep 2025), with the 2.1.x line showing the regular patch-and-minor cadence you want from infrastructure: bug fixes shipped promptly, geometry features tracking new GEOS capabilities. There is no sign of drift or maintenance-mode quiet here — shapely is one of the most actively maintained projects in this entire survey, and the 2.x series continues to evolve rather than just receive security patches.
Community Size & Ecosystem Centrality#
This is shapely’s defining strategic attribute. At ~67M downloads/month it is the single most-downloaded library in this survey — roughly 3–4× geopandas and pyproj, and an order of magnitude above geopy and folium. Star count (~4.5k) is respectable but understates its importance; downloads are the truer measure because shapely is pulled in transitively by nearly everything that touches vector geometry in Python.
Centrality cuts two ways. The upside: a library this central does not get abandoned, because too much depends on it and the community will route around any single maintainer’s departure. The (very mild) downside: its API is now a de facto standard that other libraries encode against, so shapely is somewhat constrained from making breaking changes — which is exactly why the 2.x migration was handled so carefully.
Bus-Factor / Maintenance Risk#
Low. While core geometry expertise (GEOS internals, the C/NumPy boundary) is specialized and not held by a huge number of people, shapely’s position inside the Toblerity/GeoPandas community means maintenance is organizationally distributed rather than resting on one named individual. The successful 2.x rearchitecture demonstrated that the project can absorb large, coordinated work. The residual risk is the general one for any C-extension wrapper: a shrinking pool of contributors comfortable at the C/GEOS layer. That is a watch item across the whole geospatial stack, not a shapely-specific weakness.
License & Lock-In#
shapely itself is BSD-3 — maximally permissive, no friction for commercial or
proprietary use. The strategic nuance is the bundled GEOS, which is LGPL.
For the overwhelming majority of users (who pip install shapely and use the
wheels) this is a non-issue: LGPL obligations are satisfied by dynamic linking and
the ability to relink, which the standard distribution preserves. Teams that
statically embed GEOS into a redistributed binary, or that ship shapely inside a
closed appliance, should have legal review the LGPL terms — but this is the normal,
well-understood GEOS situation, not a shapely quirk. No known CVEs.
Migration Story#
There is no realistic reason to migrate away from shapely, and that is itself a strategic finding. shapely is the geometry layer; the question is not “what do we replace shapely with” but “what would survive if GEOS-via-Python disappeared,” which is essentially nothing in the pure-Python world. The nearest conceptual alternatives — calling GEOS through a different binding, or moving to a different geometry engine entirely (e.g., a GDAL-centric or PostGIS-centric architecture) — are not drop-in replacements and would force rewrites of every downstream consumer including geopandas. In practice, you migrate to shapely, not away from it.
If you must reduce shapely exposure for some reason (e.g., performance at extreme scale), the realistic path is to push geometry operations into a database (PostGIS) or a columnar engine, not to swap shapely for a peer Python library.
Verdict#
Safe long-term bet — foundational. shapely is the lowest-risk dependency in this survey: most-downloaded, ecosystem-central, actively maintained, permissively licensed, with no migration pressure. Build on it without reservation; the only governance hygiene worth tracking is the long-run health of the GEOS/C-extension contributor pool, which is an ecosystem-wide concern rather than a reason for hesitation.