Fast Information Retrieval in the Open Grid Service Architecture
DOI:
https://doi.org/10.55630/sjc.2011.5.207-236Keywords:
Grid Computing, Information Retrieval, Web ServicesAbstract
In research, grid computing is an established way of providing computer resources for information retrieval. However, e-science grids also contain, process and produce documents - thereby acting as digital libraries and requiring means for information discovery. In this paper, we discuss how distributed information retrieval can be integrated into the Open Grid Service Architecture (OGSA) to efficiently provide image retrieval for e-science grids. We identify two fundamental ways of performing information retrieval on the grid - as a batch job or as a distributed activity - and argue the case for the latter for reasons of efficiency. We give an analysis of the theoretic communication and computation complexity and demonstrate that bandwidth limitations provide a decisive argument to support our case. We describe further design decisions for our system architecture and give a brief comparison with other designs reported in literature. Lastly, we describe how the statelessness and isolation of web services impede data-intensive, distributed, cross-site activities in OGSA grids, and how to escape them.