CREATE TABLE "service_providers" (
"id" uuid NOT NULL,
"name" varchar(255) NOT NULL,
"physical_location_geopoints", geography
)
We have inserted ~100k service provider records (Each Service provider was offering services at-least in 0-20 different locations.) into the column specified as geometry as multi-points ( collection of Points, ) in our test DB ( similar resources as the the Production DB).
Filter and sort all the Providers within a radius of 50,000 meters (50 km).
We wrote the equivalent query geo-distance query that looks for service providers within a radius of 50,000 meters and also applied various other filters to mimic the real use cases.
EXPLAIN (ANALYZE, COSTS)
SELECT
*
FROM providers p
WHERE st_dwithin(p.physical_locations, ST_Point(-97.733330, 30.266666, 4326), 5)
OPRDER BY st_distance(p.physical_locations, ST_Point(-97.733330, 30.266666, 4326))
Response Time
...
Planning Time: 0.308 ms
Execution Time: 26.104 ms
Without the performance optimization above query was taking between 20ms-40ms and it was not acceptable because Elasticsearch was much faster (below snippet) for the same query with same amount of data, and the response time was between 10ms-20ms.
Equivalent Elasticsearch query (Same amount of data)
Here is, how we were storing data into Elasticsearch:
We created a geospatial index to improve the performance of our geo-distance queries, so we created an index as follows:
CREATE INDEX service_providers_physical_location_geopoints_test_idx ON service_providers USING gist ( physical_location_geopoints );
Important Note: Our column is geography type in the PostgreSQL table and we are going to use the same type in our queries , so we wonβt have to cast explicitly. Thus in order to take advantage of a spatial index, the query and indexed column should be the same type otherwise indexing won't work.
QUERY PLAN |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Sort (cost=195464.06..195464.66 rows=1 width=417) (actual time=7.374..7.376 rows=9 loops=1) |
Sort Key: (st_distance(physical_location_geopoints, '0101000020E610000084F4143944443E40C2340C1F119154C0'::geography, true)) |
Sort Method: quicksort Memory: 35kB |
-> Index Scan using service_providers_physical_location_geopoints_test_idx on providers p (cost=48.30..195462.86 rows=1 width=417) (actual time=1.232..7.351 rows=9 loops=1) |
Index Cond: (physical_location_geopoints && _st_expand('0101000020E610000084F4143944443E40C2340C1F119154C0'::geography, '50000'::double precision)) |
Filter: st_dwithin(service_providers.physical_location_geopoints, '0101000020E61000003ECBF3E0EE6E58C084F4143944443E40'::geography, '250000'::double precision, true) |
Rows Removed by Filter: 979 |
Planning Time: 0.303 ms |
Execution Time: 7.426 ms |
And, after the proper indexing and optimization, our PostGIS query response time was dropped nearly 4x times And you can see the spatial index service_providers_physical_location_geopoints_test_idx is being used when we analyzed the query.