Computing has been widely disseminated during the WWW 2012 Conference in Lyon.

SeCo was one of the organizers of the CrowdSearch workshop at WWW 2012.

The proceedings are published by CEUR-WS here.

There, we have presented a paper on a model-driven approach to crowd sourcing search results:

Alessandro Bozzon, Marco Brambilla, Andrea Mauri: A Model-Driven Approach for Crowdsourcing Search. Crowdsearch 2012 workshop at WWW 2012, CEUR-WS, vol. 842, pp. 31-35

Here are the slides:

.. and the video of the system:

The presentation received excellent feedback from the audience and complemented very well the other approaches presented during the workshop.

Furthermore, SeCo was presented within the full paper accepted at WWW within the main conference scientific track in the Crowdsourcing session:
Alessandro Bozzon, Marco Brambilla, Stefano Ceri: Answering search queries with CrowdSearcher. WWW 2012: 1009-1018.
The presentation given during the session is the following:

Finally, SeCo also got a poster accepted on the economic aspects of federated search:
Marco Brambilla, Sofia Ceppi, Nicola Gatti, Enrico H. Gerding: A sharing mechanism for federated search and advertising. WWW (Companion Volume) 2012: 465-466.

Here are two pictures of the displayed poster, on show during the conference in the exhibition area:


From an outstanding group of over 200 proposals, thirty exceptional PhD students have been selected to be part of Yahoo!’s 2012 Key Scientific Challenges (KSC) Program.

Yahoo! Key Scientific Challenge Program

Sofia Ceppi, a Ph.D. student of Nicola Gatti, has been chosen to receive this very competitive this year, on a proposal within the Computing topics.

The proposal is on Federated Search Sharing and starts from the requirements imposed by the typical search computing scenario of multi-domain and multi-provider of content in search.The research will start from the preliminary results achieved so far and published in a poster paper at the WWW 2012 International Conference in Lyon in April 2012.

The poster paper is available for download on the WWW 2012 conference web site here.

Within the KSC program, students will have the opportunity to work with select datasets through our Webscope program and to interact with researchers.

is promoting a very important initiative toward the democratization of research activities with big data. Such initiative, named Webscope, allows academic researchers to access a bunch of datasets, all of which  “reviewed to conform to Yahoo!’s data protection standards” on privacy.

Among the available datasets, great space is given to language and graph data, but some datasets also address important topics such as advertising, marketing and rating data.

More information about this initiative is available on the Webscope Website.

SeCo is organizing the First International on Searching and Integrating New Web Data Sources ( 2011), that will take place on September 2nd, as a satellite event of 2011 in Seattle, WA, USA.

The goal of th workshop is to gather researchers and practitioners in the diverse fields related to data integration and applications on the web at the purpose of discussing innovative strategies for combining facilities with integration aspects for Web data sources.

The workshop proceedings are now available online. You can download the single PDF file (Size 5MB) from here:

VLDS 2011 proceedings

Prof. Zicari interviewed Dr. Alon Y. Halevy, head of the Structured Data Group at Research, on Google Fusion Tables and the importance of large scale data management tools.

The full transcript of the interview is available on the Web site.

- a project developed and maintained under the Apache umbrella – is a continuous integration server that is fully integrated with many popular build systems (most notably maven2) and supports automated building, testing and releasing of applications. Continuum can be either deployed as a stand-alone server or inside an application container; this is focused on the latter scenario since it involve some non-trivial preparation.

The objective is to deploy Continuum inside Tomcat 6 and set it up to build and test our project at every change.

The deployment environment is the following:

Debian squeeze
tomcat6        6.0.28
openjdk-6-jdk  6b18-1.8.7
maven2         2.2.1-5
subversion     1.6.12dfsg-5

The package mentioned above can be installed and set up automatically using aptitude. Continuum – however – is not packaged and needs to be installed manually. In this tutorial we use Continuum 1.4 beta (the war, but the tar.gz will come in handy during the deploy).

Before setting up the web application, we need to setup the workspace for Continuum; Tomcat, in Debian, runs as a separate user (tomcat6) and is not able to write outside its directories. To host Continuum configuration files, databases, work area, and maven local repository we need a directory that is accessible to Tomcat for writing operations:

mkdir /var/lib/continuum
mkdir /var/lib/continuum/{conf,data,db,logs,m2}
chown -R tomcat6.tomcat6 /var/lib/continuum

The Data Science Toolkit is a “collection of the best open data sets and open-source tools for data science, wrapped in an easy-to-use REST/JSON with command line, Python and Javascript interfaces”.

Examples of the services provided by the toolkit are:

  • Street Address to Coordinates conversion: calculates the latitude/longitude coordinates for a postal address.
    Currently restricted to the US and UK.
  • File to Text conversion: extracts text from PDFs, Word Documents, Excel Spreadsheets. It also recovers text from JPEG, PNG or TIFF images of scanned documents
  • Coordinates to political areas conversion: returns the country, region, state, county, constituencies and neighborhood a point is inside.
  • GeoDict:  it pulls country, city and region names from unstructured English text, and returns their coordinates.
  • IP Address to Coordinates conversion: it calculates country, state, city and latitude/longitude coordinates for IP addresses.

The toolkit also contains services for text analysis, such as the Text To People and the Text To Time services.

The latest version is marked as 0.35, and it has been released in April 17th 2011. The Data Science Toolkit was assembled by Pete Warden and the source code is available at

Researches from the Computing project attended the 11th International Conference on Web Engineering (ICWE 2011) which took place in Paphos (Cyprus) on June 20-24.

Several works has been presented at the conference:

  • A from Stefano Ceri: The Anatomy of a Multi-Domain Search Infrastructure;
  • A research paper about Multi-way rank join with parallel access;
  • A live of the SeCo system.

The conference also featured a SeCo-sponsored event: the First International on Search, Exploration and Navigation of Web Data Sources (ExploreWeb 2011)

Researches from the Computing project attended the 2011 ACM SIGMOD Conference, which took place in Athens (Greece) on June 12-16.

A novel, live of the SeCo and environment has been presented at a dedicated booth.


Search Computing: Multi-domain Search on Ranked Data, authored by Alessandro Bozzon, Daniele Braga, Marco Brambilla, Stefano Ceri, Francesco Corcoglioniti, Piero Fraternali, Salvatore Vadacca

After organizing two workshops in Como, the Computing project decided to go “on the road”.  Several workshops have been successfully applied to conferences such as , ISWC, ICWE, and ECOWS. More details here, or on the workshops’ Websites.

At ICWE 2011 in Paphos, Crete (June) we organize the
, chaired by Brambilla, Fraternali, and Schwabe, see:

- At VLDB, we organize the “Very Large Data Search” Workshop, chaired
by M. Brambilla, F. Casati S. Ceri, with Hector Garcia Molina
and Alon Halevy as keynotes, see:

- Also at VLDB, we sponsor the workshop, chaired by Chackrabarti
and Martinenghi, Jan Chomicki is speaker, see:

At ECOWS 2011 in Lugano, Switzerland (September) we organize the
Workshop, chaired by Bozzon, Comai, and Norrie, see:

At ISWC 2011 in Bonn, Germany (October) we organize the
Workshop, chaired by Della Valle, Horrocks and Bozzon, see:

