is promoting a very important initiative toward the democratization of research activities with big data. Such initiative, named Webscope, allows academic researchers to access a bunch of datasets, all of which  “reviewed to conform to Yahoo!’s data protection standards” on privacy.

Among the available datasets, great space is given to language and graph data, but some datasets also address important topics such as advertising, marketing and rating data.

More information about this initiative is available on the Webscope Website.

SeCo is organizing the First International on Searching and Integrating New Web Data Sources ( 2011), that will take place on September 2nd, as a satellite event of 2011 in Seattle, WA, USA.

The goal of th workshop is to gather researchers and practitioners in the diverse fields related to data integration and applications on the web at the purpose of discussing innovative strategies for combining facilities with integration aspects for Web data sources.

The workshop proceedings are now available online. You can download the single PDF file (Size 5MB) from here:

VLDS 2011 proceedings

Prof. Zicari interviewed Dr. Alon Y. Halevy, head of the Structured Data Group at Research, on Google Fusion Tables and the importance of large scale data management tools.

The full transcript of the interview is available on the ODBMS.org Web site.

- a project developed and maintained under the Apache umbrella – is a continuous integration server that is fully integrated with many popular build systems (most notably maven2) and supports automated building, testing and releasing of applications. Continuum can be either deployed as a stand-alone server or inside an application container; this is focused on the latter scenario since it involve some non-trivial preparation.

The objective is to deploy Continuum inside Tomcat 6 and set it up to build and test our project at every change.

The deployment environment is the following:

Debian squeeze
tomcat6        6.0.28
openjdk-6-jdk  6b18-1.8.7
maven2         2.2.1-5
subversion     1.6.12dfsg-5

The package mentioned above can be installed and set up automatically using aptitude. Continuum – however – is not packaged and needs to be installed manually. In this tutorial we use Continuum 1.4 beta (the war, but the tar.gz will come in handy during the deploy).

Before setting up the web application, we need to setup the workspace for Continuum; Tomcat, in Debian, runs as a separate user (tomcat6) and is not able to write outside its directories. To host Continuum configuration files, databases, work area, and maven local repository we need a directory that is accessible to Tomcat for writing operations:

mkdir /var/lib/continuum
mkdir /var/lib/continuum/{conf,data,db,logs,m2}
chown -R tomcat6.tomcat6 /var/lib/continuum

continue reading…

The Data Science Toolkit is a “collection of the best open data sets and open-source tools for data science, wrapped in an easy-to-use REST/JSON with command line, Python and Javascript interfaces”.

Examples of the services provided by the toolkit are:

  • Street Address to Coordinates conversion: calculates the latitude/longitude coordinates for a postal address.
    Currently restricted to the US and UK.
  • File to Text conversion: extracts text from PDFs, Word Documents, Excel Spreadsheets. It also recovers text from JPEG, PNG or TIFF images of scanned documents
  • Coordinates to political areas conversion: returns the country, region, state, county, constituencies and neighborhood a point is inside.
  • GeoDict:  it pulls country, city and region names from unstructured English text, and returns their coordinates.
  • IP Address to Coordinates conversion: it calculates country, state, city and latitude/longitude coordinates for IP addresses.

The toolkit also contains services for text analysis, such as the Text To People and the Text To Time services.

The latest version is marked as 0.35, and it has been released in April 17th 2011. The Data Science Toolkit was assembled by Pete Warden and the source code is available at http://github.com/petewarden/dstk

Researches from the Computing project attended the 11th International Conference on Web Engineering (ICWE 2011) which took place in Paphos (Cyprus) on June 20-24.

Several works has been presented at the conference:

  • A from Stefano Ceri: The Anatomy of a Multi-Domain Search Infrastructure;
  • A research paper about Multi-way rank join with parallel access;
  • A live of the SeCo system.

The conference also featured a SeCo-sponsored event: the First International on Search, Exploration and Navigation of Web Data Sources (ExploreWeb 2011)

continue reading…

Researches from the Computing project attended the 2011 ACM SIGMOD Conference, which took place in Athens (Greece) on June 12-16.

A novel, live of the SeCo and environment has been presented at a dedicated booth.

DEMONSTRATION

Search Computing: Multi-domain Search on Ranked Data, authored by Alessandro Bozzon, Daniele Braga, Marco Brambilla, Stefano Ceri, Francesco Corcoglioniti, Piero Fraternali, Salvatore Vadacca

continue reading…

After organizing two workshops in Como, the Computing project decided to go “on the road”.  Several workshops have been successfully applied to conferences such as , ISWC, ICWE, and ECOWS. More details here, or on the workshops’ Websites.

At ICWE 2011 in Paphos, Crete (June) we organize the
, chaired by Brambilla, Fraternali, and Schwabe, see:
http://exploreweb.search-computing.org/


- At VLDB, we organize the “Very Large Data Search” Workshop, chaired
by M. Brambilla, F. Casati S. Ceri, with Hector Garcia Molina
and Alon Halevy as keynotes, see: http://vlds2011.search-computing.net/

- Also at VLDB, we sponsor the workshop, chaired by Chackrabarti
and Martinenghi, Jan Chomicki is speaker, see:
http://www.cs.uwaterloo.ca/conferences/dbrank/2011/

At ECOWS 2011 in Lugano, Switzerland (September) we organize the
Workshop, chaired by Bozzon, Comai, and Norrie, see:
http://dataview.como.polimi.it

At ISWC 2011 in Bonn, Germany (October) we organize the
Workshop, chaired by Della Valle, Horrocks and Bozzon, see:
http://ordring2011.search-computing.org/

Researches from the Computing project attended the 2oth International World Wide Web Conference (WWW 2011) which took place in Hyderabad (India) from March 28th to April 1st.

A novel, live of the Liquid Query search interaction paradigm has been presented at a dedicated booth.

DEMONSTRATION

Exploratory search in multi-domain information spaces with Liquid Query, authored by Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Piero Fraternali, and Salvatore Vadacca.

continue reading…

In these times of  social networks and -based productivity platforms, traditional engines fail in providing tools and services able to collect and organize one’s Web information. Greplin tries to overcome such a limitation by providing a personal search engine for all that data you keep locked away in the cloud.

Greplin Screenshot can interface with several Web social data service (typically, all those services that provide an ), like Gmail, Docs, Facebook, Twitter, and Dropbox. A full list of supported data sources is available here.

At the time, Greplin does not provide a public API to access its search functionality, but it looks like they are working on it.

Here is a presentation video of Greplin.

Greplin Demo from greplin on Vimeo.

Powered by WordPress Web Design by SRS Solutions © 2012 Search Computing Blog Design by SRS Solutions
Rss Feed Tweeter button Facebook button Linkedin button Delicious button Digg button