Abstract:Spatial join aggregate (SJA) is a commonly used but time-consuming operation in spatial database. Especially when faced with the deluge of spatial data,SJA is difficult to be implemented on a single machine. Consequentially, how to design efficient distributed SJA algorithms is receiving more and more attention. Constrained by the sequential scan operation assumption, Map-Reduce is usually used to accelerate the non- indexed spatial join query, but none of the previous work can process SJA with both Map-Reduce and R-tree spatial index. Thus, a novel algorithm, R-tree based Spatial Join Aggregate with Map-Reduce (RSJA-MR) was proposed, which is able to return results more efficiently. A distributed R-tree index structure was presented to index the large-scale spatial data. RSJA-MR first made use of distributed R-tree to generate the tasks. Those tasks met independent parallel computation and could easily be expressed in Map-Reduce. An index cache mechanism was provided to support the concurrent access of R-tree index. The experiment results show that, compared with the non-indexed SJA , the time performance of RSJA-MR is improved at least by 8% for spatial intersection join aggregate and by 35% for spatial containment join aggregate.