Alessandro Bozzon, Marco Brambilla, Andrea Mauri: A Model-Driven Approach for Crowdsourcing Search. Crowdsearch 2012 workshop at WWW 2012, CEUR-WS, vol. 842, pp. 31-35
The presentation received excellent feedback from the audience and complemented very well the other approaches presented during the workshop.
Furthermore, SeCo was presented within the full paper accepted at WWW within the main conference scientific track in the Crowdsourcing session:
Alessandro Bozzon, Marco Brambilla, Stefano Ceri: Answering search queries with CrowdSearcher. WWW 2012: 1009-1018.
The presentation given during the session is the following:
Finally, SeCo also got a poster accepted on the economic aspects of federated search:
Marco Brambilla, Sofia Ceppi, Nicola Gatti, Enrico H. Gerding: A revenue sharing mechanism for federated search and advertising. WWW (Companion Volume) 2012: 465-466.
Here are two pictures of the displayed poster, on show during the conference in the exhibition area:
Sofia Ceppi, a Ph.D. student of Nicola Gatti, has been chosen to receive this very competitive award this year, on a proposal within the Search Computing topics.
The proposal is on Federated Search Revenue Sharing and starts from the requirements imposed by the typical search computing scenario of multi-domain and multi-provider of content in search.The research will start from the preliminary results achieved so far and published in a poster paper at the WWW 2012 International Conference in Lyon in April 2012.
The poster paper is available for download on the WWW 2012 conference web site here.
Within the KSC program, students will have the opportunity to work with select datasets through our Webscope program and to interact with Yahoo! researchers.
Yahoo! is promoting a very important initiative toward the democratization of research activities with big data. Such initiative, named Webscope, allows academic researchers to access a bunch of datasets, all of which “reviewed to conform to Yahoo!’s data protection standards” on privacy.
Among the available datasets, great space is given to language and graph data, but some datasets also address important topics such as advertising, marketing and rating data.
More information about this initiative is available on the Webscope Website.
SeCo is organizing the First International Workshop on Searching and Integrating New Web Data Sources (VLDS 2011), that will take place on September 2nd, as a satellite event of VLDB 2011 in Seattle, WA, USA.
The goal of th workshop is to gather researchers and practitioners in the diverse fields related to data integration and search applications on the web at the purpose of discussing innovative strategies for combining search facilities with integration aspects for Web data sources.
The workshop proceedings are now available online. You can download the single PDF file (Size 5MB) from here:
Prof. Zicari interviewed Dr. Alon Y. Halevy, head of the Structured Data Group at Google Research, on Google Fusion Tables and the importance of large scale data management tools.
The full transcript of the interview is available on the ODBMS.org Web site.
Continuum- a project developed and maintained under the Apache umbrella – is a continuous integration server that is fully integrated with many popular build systems (most notably maven2) and supports automated building, testing and releasing of applications. Continuum can be either deployed as a stand-alone server or inside an application container; this tutorial is focused on the latter scenario since it involve some non-trivial preparation.
The objective is to deploy Continuum inside Tomcat 6 and set it up to build and test our project at every change.
The package mentioned above can be installed and set up automatically using aptitude. Continuum – however – is not packaged and needs to be installed manually. In this tutorial we use Continuum 1.4 beta (the war, but the tar.gz will come in handy during the deploy).
Before setting up the web application, we need to setup the workspace for Continuum; Tomcat, in Debian, runs as a separate user (tomcat6) and is not able to write outside its directories. To host Continuum configuration files, databases, work area, and maven local repository we need a directory that is accessible to Tomcat for writing operations:
The Data Science Toolkit is a “collection of the best open data sets and open-source tools for data science, wrapped in an easy-to-use REST/JSON API with command line, Python and Javascript interfaces”.
Examples of the services provided by the toolkit are:
Street Address to Coordinates conversion: calculates the latitude/longitude coordinates for a postal address.
Currently restricted to the US and UK.
File to Text conversion: extracts text from PDFs, Word Documents, Excel Spreadsheets. It also recovers text from JPEG, PNG or TIFF images of scanned documents
Coordinates to political areas conversion: returns the country, region, state, county, constituencies and neighborhood a point is inside.
GeoDict: it pulls country, city and region names from unstructured English text, and returns their coordinates.
IP Address to Coordinates conversion: it calculates country, state, city and latitude/longitude coordinates for IP addresses.
The toolkit also contains services for text analysis, such as the Text To People and the Text To Time services.
The latest version is marked as 0.35, and it has been released in April 17th 2011. The Data Science Toolkit was assembled by Pete Warden and the source code is available at http://github.com/petewarden/dstk
The conference also featured a SeCo-sponsored event: the First International Workshop on Search, Exploration and Navigation of Web Data Sources (ExploreWeb 2011)
After organizing two workshops in Como, the Search Computing project decided to go “on the road”. Several workshops have been successfully applied to conferences such as VLDB, ISWC, ICWE, and ECOWS. More details here, or on the workshops’ Websites.
- At VLDB, we organize the “Very Large Data Search” Workshop, chaired
by M. Brambilla, F. Casati S. Ceri, with Hector Garcia Molina
and Alon Halevy as keynotes, see: http://vlds2011.search-computing.net/
At ECOWS 2011 in Lugano, Switzerland (September) we organize the DATAVIEW
Workshop, chaired by Bozzon, Comai, and Norrie, see: http://dataview.como.polimi.it