This post is part of a series [1], [2], [3], [4], [5]

Based on my earlier tests I felt confident that I could expand the size of the dataset to my intended extent. So I grabbed 12-digit HUCS for the entire United States. I was confident this would crush spatialite and finally make the response time for my question extend to close to 1 second. But just in case it didn’t, I would make my question harder at the same time.

While I’m playing with these queries I need a reference guide. Of course the starting point is the spatialite SQL guide found here http://www.gaia-gis.it/gaia-sins/spatialite-sql-3.0.0.html. But, while this is a nice list of functions, I need more help than it provides. Thankfully, spatialite shares many technical concepts with PostGIS (follows OGC, etc. SQL and data standards, and they both use the GEOS spatial library). So, I’ve been successfully poaching help from the PostGIS documentation (http://www.postgis.org/docs/ST_GeomFromText.html) … which is quite good and gives plenty of use examples.

So, getting back to the data, here is my new queryable data set.

12-digit hydrologic units for the entire US

12-digit hydrologic units for the entire US

The pink blob on the right my new, harder question.. searching with not one pair of coordinates, but a bunch of pairs in the form of a pink polygon. Pink scares computers, so this should hurt it a little. It might also be scary that my database file representing the HUCs is now 1.9GB in size (lots of coordinates and the indexes to describes them).

Pink test polygon

Pink test polygon

Because I’m likely going to be using coordinate pairs passed in from some kind of Web application, I converted the polygon to well-known text using

--- 
SELECT ST_AsText(geom) from test_polys;

which of course gives us

select a geometry (polygon) as well-known text (wkt)

With the handy text string to describe my polygon given to me, I’m able to just copy and paste it into my text SQL. So let’s do that and the first query should really hurt because I’m not going to use an index. Note that I switched from Contains.. to Intersects since I want to detect anything that touches my pink poly.

--- 
select HU_12_NAME FROM huc12
WHERE ST_Intersects(Geometry, ST_GeomFromText('POLYGON((-70.286127 43.839038, 
-70.305482 43.823696, -70.30855 43.798912, -70.292028 43.773421, 
-70.243169 43.769644, -70.231367 43.774601, -70.193601 43.811186, 
-70.186048 43.836206, -70.264648 43.839274, -70.264648 43.839274, 
-70.274089 43.840218, -70.286127 43.839038))')) = 1

How did it do? Surprisingly well. 40 seconds give or take. Of course, that won’t work for my application, so I’m keeping my fingers crossed that the index rescues me.

Polygon query with no index

Polygon query with no index

Now how about with the index? Here’s the query.

--- 
select HU_12_NAME FROM huc12
WHERE ST_Intersects(Geometry, ST_GeomFromText('POLYGON((-70.286127 43.839038, 
-70.305482 43.823696, -70.30855 43.798912, -70.292028 43.773421, 
-70.243169 43.769644, -70.231367 43.774601, -70.193601 43.811186, 
-70.186048 43.836206, -70.264648 43.839274, -70.264648 43.839274, 
-70.274089 43.840218, -70.286127 43.839038))')) = 1
AND ROWID IN (
SELECT ROWID
FROM SpatialIndex
WHERE f_table_name = 'huc12'
AND search_frame = ST_GeomFromText('POLYGON((-70.286127 43.839038, 
-70.305482 43.823696, -70.30855 43.798912, -70.292028 43.773421, 
-70.243169 43.769644, -70.231367 43.774601, -70.193601 43.811186, 
-70.186048 43.836206, -70.264648 43.839274, -70.264648 43.839274, 
-70.274089 43.840218, -70.286127 43.839038))'));

And survey says! 0.186 seconds! Oh yeah.

Pink polygon query with spatial index

Pink polygon query with spatial index

So this all well and good, but the real reason why these queries are so fast is because the test geometry (the pink polygon) is so small. So lets push that a little.

Multipart polygon with lotsa geometry

Multipart polygon with lotsa geometry

So, here I’ve made a single multipart polygon with lots of vertices to keep my query simple. I’ll spare you the geometry and the query, but the pink polygon above, querying a whole country of HUC12s with the spatial index, took 1.4 seconds. So, we finally broke our time limit with enough testing geometry.