A pre-print from Tim Finin and Li Deng entitled, Search Engines for Semantic Web Knowledge,1 presents a thoughtful and experienced overview of the challenges posed to conventional search by semantic Web constructs. The authors’ base much of their observations on their experience with the Swoogle semantic Web search engine over the past two years. They also used Swoogle, whose index contains information on over 1.3M RDF documents, to generate statistics on the semantic Web size and growth in the paper.
Among other points, the authors note these key differences and challenges from conventional search engines:
- Harvesting — the need to discriminantly discover semantic Web documents and to accurately index their semi-structured components
- Search — the need for search to cover a broader range than documents in a repository, going from the universal to the atomic granularity of a triple. Path tracing and provenance of the information may also be important
- Rank — results ranking needs to account for the contribution of the semi-structured data, and
- Archive — more versioning and tracking is needed since undelrying ontologies will surely grow and evolve.
The authors particularly note the challenge of indexing as repositories grow to actual Internet scales.
Though not noted, I would add to this list the challenge of user interfaces. Only a small percentage of users, for example, use Google’s more complicated advanced search form. In its full-blown implementation, semantic Web search variations could make the advanced Google form look like child’s play.
1Tim Finin and Li Ding, "Search Engines for Semantic Web Knowledge," a pre-print to be published in the Proceedings of XTech 2006: Building Web 2.0, May 16, 2006, 19 pp. A PDF of the paper is available for download.