[SPARK-55790][Geo][SQL] Build a complete SRS registry using PROJ 9.7.1 data by uros-db · Pull Request #54571 · apache/spark

uros-db · 2026-03-02T09:52:39Z

What changes were proposed in this pull request?

Build a more complete Spatial Reference System (SRS) registry in Spark, introducing 10000+ entries sourced from the PROJ library's EPSG and ESRI databases, which will unlock substantially improving the breadth of geospatial type support.

Why are the changes needed?

This will enable Geometry and Geography types to support 10000+ additional SRID/CRS values, both in JVM and Python. Note that currently Geometry and Geography types offer only limited SRID support (a few hardcoded values).

Does this PR introduce any user-facing change?

No.

How was this patch tested?

The script and auto-generated golden files are self-contained.

Was this patch authored or co-authored using generative AI tooling?

Yes, Claude 4.6 Opus.

uros-db

@szehon-ho @cloud-fan Please review.

uros-db

Docker integration failure seems unrelated to these changes.

szehon-ho

the pr has two generatedcsv which is very big. just wondering, is there no way to have one generated one, and copy it at build time?
do we plan to run it just manually? I guess proj does not release that frequently that its worth to automate it?

szehon-ho · 2026-03-02T22:48:51Z

dev/generate_srs_registry.py

@@ -0,0 +1,377 @@
+#!/usr/bin/env python3


should we use hyphen delimiter for this script name, like other dev scripts?

szehon-ho · 2026-03-02T22:49:18Z

dev/generate_srs_registry.py

+    url = PROJ_RAW_URL.format(version=version, filename=filename)
+    print(f"  Downloading {url}")
+    try:
+        with urllib.request.urlopen(url) as response:


do we want to add an overridable timeout, to not hang the script (if we ever call it from a build job, for example)

szehon-ho · 2026-03-02T22:50:41Z

dev/generate_srs_registry.py

+        'geocentric', 'other'            -> non-geographic
+
+    Returns a list of (srid, string_id, is_geographic) tuples,
+    excluding deprecated entries and entries with non-numeric codes.


is it right? I dont see the script filtering deprecated entries?

szehon-ho · 2026-03-02T22:51:11Z

dev/generate_srs_registry.py

+    try:
+        with urllib.request.urlopen(url) as response:
+            return response.read().decode("utf-8")
+    except urllib.error.HTTPError as e:


should we catch other error like, URLError?

szehon-ho · 2026-03-02T22:52:02Z

dev/generate_srs_registry.py

+
+    print()
+
+    # Deduplicate: keep the first occurrence of each SRID.


does this happen? should we comment which gives precedence (ESPG/ESRI)

szehon-ho · 2026-03-02T22:52:24Z

dev/generate_srs_registry.py

+        if not match:
+            continue
+        fields = parse_sql_values(match.group(1))
+        auth_name = fields[0]


nit: add a check for fields length

Initial commit

9846d10

uros-db commented Mar 2, 2026

View reviewed changes

Fix lint

e258c79

uros-db commented Mar 2, 2026

View reviewed changes

szehon-ho reviewed Mar 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-55790][Geo][SQL] Build a complete SRS registry using PROJ 9.7.1 data#54571

[SPARK-55790][Geo][SQL] Build a complete SRS registry using PROJ 9.7.1 data#54571
uros-db wants to merge 2 commits intoapache:masterfrom
uros-db:geo-proj

uros-db commented Mar 2, 2026

Uh oh!

uros-db left a comment

Uh oh!

uros-db left a comment

Uh oh!

szehon-ho left a comment

Uh oh!

szehon-ho Mar 2, 2026 •

edited

Loading

Uh oh!

szehon-ho Mar 2, 2026

Uh oh!

szehon-ho Mar 2, 2026

Uh oh!

szehon-ho Mar 2, 2026

Uh oh!

szehon-ho Mar 2, 2026

Uh oh!

szehon-ho Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		print()

		# Deduplicate: keep the first occurrence of each SRID.

Conversation

uros-db commented Mar 2, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

uros-db left a comment

Choose a reason for hiding this comment

Uh oh!

uros-db left a comment

Choose a reason for hiding this comment

Uh oh!

szehon-ho left a comment

Choose a reason for hiding this comment

Uh oh!

szehon-ho Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

szehon-ho Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

szehon-ho Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

szehon-ho Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

szehon-ho Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

szehon-ho Mar 2, 2026 •

edited

Loading