Experiment
Experimentation at DB Level
First, we need to enabling PostGIS extension:
CREATE EXTENSION IF NOT EXISTS postgis
Create a table service_providers
CREATE TABLE "service_providers" (
"id" uuid NOT NULL,
"name" varchar(255) NOT NULL,
"physical_location_geopoints", geography
)
We have inserted ~100k
service provider records (Each Service provider was offering services at-least in 0-20 different locations.) into the column specified as geometry
as multi-points ( collection of Points, ) in our test DB ( similar resources as the the Production DB).
SELECT id, physical_location_geopoints FROM service_providers;
id | physical_location_geopoints |
-------------------------------------+-----------------------------------------------------------------------------+
261f588b-310c-4df0-9d60-46816ced78c2 | MULTIPOINT ((-98.23001 26.20341)) |
76fcecd0-a3dd-40ac-80ce-80ef559662e1 | MULTIPOINT ((-85.61003 42.94114), (-85.66809 42.96336), (-85.61003 42.94114))|
9e3081ad-81e7-4399-93b4-85c070a77117 | MULTIPOINT ((-76.28522 36.84681), (-80.27811 25.8576)) |
7e4aea52-db21-49cb-be39-2fe9b6a52e6f | MULTIPOINT EMPTY
Filter and sort all the Providers within a radius of 50,000 meters (50 km).
We wrote the equivalent query geo-distance query that looks for service providers within a radius of 50,000 meters and also applied various other filters to mimic the real use cases.
EXPLAIN (ANALYZE, COSTS)
SELECT
*
FROM providers p
WHERE st_dwithin(p.physical_locations, ST_Point(-97.733330, 30.266666, 4326), 5)
OPRDER BY st_distance(p.physical_locations, ST_Point(-97.733330, 30.266666, 4326))
Response Time
...
Planning Time: 0.308 ms
Execution Time: 26.104 ms
Without the performance optimization above query was taking between 20ms-40ms
and it was not acceptable because Elasticsearch was much faster (below snippet) for the same query with same amount of data, and the response time was between 10ms-20ms
.
Equivalent Elasticsearch query (Same amount of data)
Here is, how we were storing data into Elasticsearch:
{
"service_provider_id": "sp_123",
"name": "Amazing Service Provider",
"physical_locations": [
{
"name": "Main Office",
"coordinates": {
"lat": 30.25,
"lon": -97.75
}
},
{
"name": "Branch Office",
"coordinates": {
"lat": 30.30,
"lon": -97.70
}
},
...
],
"services": [
"plumbing",
"electrical",
"hvac"
]
}
GET services-provider-alias/_search
{
"query": {
"bool": {
"must": [
{
"geo_distance": {
"distance": "50000",
"physical_locations.coordinates": "30.266666, -97.733330"
}
}
]
}
}
}
We created a geospatial index to improve the performance of our geo-distance queries, so we created an index as follows:
CREATE INDEX service_providers_physical_location_geopoints_test_idx ON service_providers USING gist ( physical_location_geopoints );
Important Note: Our column is geography
type in the PostgreSQL table and we are going to use the same type in our queries , so we won’t have to cast explicitly. Thus in order to take advantage of a spatial index
, the query and indexed column should be the same type otherwise indexing won't work.
And, after the proper indexing and optimization, our PostGIS query response time was dropped nearly 4x times 🎉 And you can see the spatial index service_providers_physical_location_geopoints_test_idx
is being used when we analyzed the query.
QUERY PLAN |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Sort (cost=195464.06..195464.66 rows=1 width=417) (actual time=7.374..7.376 rows=9 loops=1) |
Sort Key: (st_distance(physical_location_geopoints, '0101000020E610000084F4143944443E40C2340C1F119154C0'::geography, true)) |
Sort Method: quicksort Memory: 35kB |
-> Index Scan using service_providers_physical_location_geopoints_test_idx on providers p (cost=48.30..195462.86 rows=1 width=417) (actual time=1.232..7.351 rows=9 loops=1) |
Index Cond: (physical_location_geopoints && _st_expand('0101000020E610000084F4143944443E40C2340C1F119154C0'::geography, '50000'::double precision)) |
Filter: st_dwithin(service_providers.physical_location_geopoints, '0101000020E61000003ECBF3E0EE6E58C084F4143944443E40'::geography, '250000'::double precision, true) |
Rows Removed by Filter: 979 |
Planning Time: 0.303 ms |
Execution Time: 7.426 ms |
Last updated