Combinations of Content Extraction Algorithms
LWA'09: Workshop Information Retrieval. Darmstadt. 2009
Erscheinungsjahr: 2009
Publikationstyp: Diverses (Konferenzbeitrag)
Sprache: Englisch
Inhaltszusammenfassung
Content Extraction is the task to identify themain text content in web documents – a topic of interest in the fields of information retrieval, web mining and content analysis. We implemented an application framework to combine different algorithms in order to improve the overall extraction performance. In this paper we present details of the framework and provide some first experimental results.
Klassifikation
DFG Fachgebiet:
Informatik
DDC Sachgruppe:
Informatik