Authors: | N Pappas N., Katsimpras G., Stamatatos E. |
---|
Title: | An Agent-Based Focused Crawling Framework for Topic- and Genre-Related Web Document Discovery |
---|
Conference: | 24th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2012) |
---|
Editors: | |
---|
Ed: | No |
---|
Eds: | No |
---|
Pages: | 508-515 |
---|
To appear: | No |
---|
Month: | |
---|
Year: | 2012 |
---|
Place: | |
---|
Pubisher: | |
---|
Link: | |
---|
File name: | |
---|
Abstract: | The discovery of web documents about certain topics
is an important task for web-based applications including web
document retrieval, opinion mining and knowledge extraction. In
this paper, we propose an agent-based focused crawling framework
able to retrieve topic- and genre-related web documents.
Starting from a simple topic query, a set of focused crawler
agents explore in parallel topic-specific web paths using dynamic
seed URLs that belong to certain web genres and are collected
from web search engines. The agents make use of an internal
mechanism that weighs topic and genre relevance scores of
unvisited web pages. They are able to adapt to the properties
of a given topic by modifying their internal knowledge during
search, handle ambiguous queries, ignore irrelevant pages with
respect to the topic and retrieve collaboratively topic-relevant
web pages. We performed an experimental study to evaluate the
behavior of the agents for a variety of topic queries demonstrating
the benefits and the capabilities of our framework. |