How much semantics on the “wild” Web is enough for machines to help us? ⋆ M´ aria Bielikov´ a Institute of Informatics and Software Engineering, Slovak University of Technology Ilkovicova 3, 842 16 Bratislava, Slovakia maria.bielikova@fiit.stuba.sk WWW home page: http://fiit.stuba.sk/~bielik Abstract. The current Web is not only a place for the content available in any time and location. It is also a place where we actually spend time to perform our working tasks, a place where we look for not only interesting informa- tion, but also entertainment, and friends, a place where we spend part of our rest. The Web is also an infrastruc- ture for applications which oﬀer various services. There is so many aspects of the Web that this diverse organism is a subject of study of researchers from various disciplines. In this paper we concentrate on information retrieval aspect of the Web, which is still prevailing. How we can improve information retrieval, be it goal-driven or exploratory? To which extent we are able to give our machines means for helping us in information retrieval tasks? Is there any level of semantics, which we can supply for the Web in general, and it will help? We present some aspects of information acquisition by search on the “wild” Web together of exam- ples of approaches to particular tasks towards the improve- ment of information search, which were proposed in last two years within the Institute of Informatics and Software Engineering at the Slovak University of Technology, espe- cially within the PeWe (Personalized Web) research group. 1 Introduction The Web is amazing by the amount of diversity of its stuﬀ, by the conception of so much thoughts, discus- sions, opinions that all show in many cases wisdom and creativity of people. This is also the bottleneck of current web – it is its nature, which involves “web objects” of various type (text, multimedia, programs) representing conceptually diﬀerent entities (the con- tent, people, things, services) and constantly chang- ing. Particular objects are not formally deﬁned, e.g. the content is semistructured, which leads to the com- plexity considering machine processing. Obvious sentences are expected here – how is the Web important for our lives (both work and private), how the Web grows, how it is dynamic and constantly ⋆ This work was partially supported by the projects VEGA 1/0508/09, KEGA 028-025STU-4/2010, and it is the partial result of the Research & Development Op- erational Programme for the project SMART II, ITMS 26240120029, co-funded by the ERDF. changing, how it absorbs people with their opinions, ratings and tags 1 . Especially its dynamic nature pre- vents us from a direct employing of the most methods developed for closed information worlds (even though big or actually present on the Web). And its size re- quires automatic (or semiautomatic) approaches for information acquisition from this large heterogeneous information space. The Web is undergoing constant development with – the Semantic Web initiative, which aims for a ma- chine readable representation of the Web [3], – the Adaptive Web initiative, which stresses the need for personalization and broader context adaptation on the Web [6], – the Web 2.0 initiative called also the Social Web, which focuses on social and collaborative aspects of the Web [14]. Development in this area matures to the point whe- re the Web is becoming so important and in fact still unknown phenomenon that is identiﬁed as a separate, original object of investigation, and there are even ini- tiatives which want to establish the Web Science as a new scientiﬁc discipline [7]. Considering information retrieval based on search (be it goal-driven or exploratory) includes also eﬀec- tive means for expressing users’ information needs – how should a user specify his query or a broader aim of the search (be it a concrete requirement for expla- nation of particular term or an abstract need for ﬁnd- ing out what is interesting or new in some domain). The “eﬀective” here means that the user gets what he expects, even if his expectations are not completely known – this is pretty similar to the software require- ments speciﬁcation, but within the “wild” Web we have so much and so diverse users with various needs that we are not able to do this manually as software engineers do with the software speciﬁcation. In general, user’s information needs usually come into existence while the user solves a task. Information needs can be classiﬁed into three categories [5]: 1 We do not mention and elaborate further another impor- tant view on the Web as an infrastructure for services and software applications.