Browsing Posts published by Alessandro Bozzon

is promoting a very important initiative toward the democratization of research activities with big data. Such initiative, named Webscope, allows academic researchers to access a bunch of datasets, all of which  “reviewed to conform to Yahoo!’s data protection standards” on privacy.

Among the available datasets, great space is given to language and graph data, but some datasets also address important topics such as advertising, marketing and rating data.

More information about this initiative is available on the Webscope Website.

Prof. Zicari interviewed Dr. Alon Y. Halevy, head of the Structured Data Group at Research, on Google Fusion Tables and the importance of large scale data management tools.

The full transcript of the interview is available on the ODBMS.org Web site.

The Data Science Toolkit is a “collection of the best open data sets and open-source tools for data science, wrapped in an easy-to-use REST/JSON with command line, Python and Javascript interfaces”.

Examples of the services provided by the toolkit are:

  • Street Address to Coordinates conversion: calculates the latitude/longitude coordinates for a postal address.
    Currently restricted to the US and UK.
  • File to Text conversion: extracts text from PDFs, Word Documents, Excel Spreadsheets. It also recovers text from JPEG, PNG or TIFF images of scanned documents
  • Coordinates to political areas conversion: returns the country, region, state, county, constituencies and neighborhood a point is inside.
  • GeoDict:  it pulls country, city and region names from unstructured English text, and returns their coordinates.
  • IP Address to Coordinates conversion: it calculates country, state, city and latitude/longitude coordinates for IP addresses.

The toolkit also contains services for text analysis, such as the Text To People and the Text To Time services.

The latest version is marked as 0.35, and it has been released in April 17th 2011. The Data Science Toolkit was assembled by Pete Warden and the source code is available at http://github.com/petewarden/dstk

Researches from the project attended the 11th International Conference on Web Engineering (ICWE 2011) which took place in Paphos (Cyprus) on June 20-24.

Several works has been presented at the conference:

  • A from Stefano Ceri: The Anatomy of a Multi-Domain Search Infrastructure;
  • A research paper about Multi-way rank join with parallel access;
  • A live of the SeCo system.

The conference also featured a SeCo-sponsored event: the First International on Search, Exploration and Navigation of Web Data Sources (ExploreWeb 2011)

continue reading…

Researches from the project attended the 2011 ACM SIGMOD Conference, which took place in Athens (Greece) on June 12-16.

A novel, live of the SeCo and environment has been presented at a dedicated booth.

DEMONSTRATION

Search Computing: Multi-domain Search on Ranked Data, authored by Alessandro Bozzon, Daniele Braga, Marco Brambilla, Stefano Ceri, Francesco Corcoglioniti, Piero Fraternali, Salvatore Vadacca

continue reading…

After organizing two workshops in Como, the project decided to go “on the road”.  Several workshops have been successfully applied to conferences such as VLDB, ISWC, ICWE, and ECOWS. More details here, or on the workshops’ Websites.

At ICWE 2011 in Paphos, Crete (June) we organize the
, chaired by Brambilla, Fraternali, and Schwabe, see:
http://exploreweb.search-computing.org/


- At VLDB, we organize the “Very Large Data Search” Workshop, chaired
by M. Brambilla, F. Casati S. Ceri, with Hector Garcia Molina
and Alon Halevy as keynotes, see: http://vlds2011.search-computing.net/

- Also at VLDB, we sponsor the workshop, chaired by Chackrabarti
and Martinenghi, Jan Chomicki is speaker, see:
http://www.cs.uwaterloo.ca/conferences/dbrank/2011/

At ECOWS 2011 in Lugano, Switzerland (September) we organize the
Workshop, chaired by Bozzon, Comai, and Norrie, see:
http://dataview.como.polimi.it

At ISWC 2011 in Bonn, Germany (October) we organize the
Workshop, chaired by Della Valle, Horrocks and Bozzon, see:
http://ordring2011.search-computing.org/

Researches from the project attended the 2oth International World Wide Web Conference (WWW 2011) which took place in Hyderabad (India) from March 28th to April 1st.

A novel, live of the Liquid Query search interaction paradigm has been presented at a dedicated booth.

DEMONSTRATION

Exploratory search in multi-domain information spaces with Liquid Query, authored by Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Piero Fraternali, and Salvatore Vadacca.

continue reading…

In these times of  social networks and -based productivity platforms, traditional search engines fail in providing tools and services able to collect and organize one’s Web information. Greplin tries to overcome such a limitation by providing a personal search engine for all that data you keep locked away in the cloud.

Greplin Screenshot can interface with several Web social data service (typically, all those services that provide an ), like Gmail, Docs, Facebook, Twitter, and Dropbox. A full list of supported data sources is available here.

At the time, Greplin does not provide a public API to access its search functionality, but it looks like they are working on it.

Here is a presentation video of Greplin.

Greplin Demo from greplin on Vimeo.

The ParkWhiz API provides developer with an access to the ParkWhiz’s real-time parking and event data in major US cities, airports, venues, and events. The ParkWhiz deals with 4 types of objects:

  • Parking locations: a specific geographic location; it contains the description of the physical location of a parking spot.
  • Parking listings: pricing and availability information for a parking location.
  • Venues: points of interest, such as theatres, stadiums, or airports. A venue object describes the geographic location of the venue, as well as any events occurring at that venue.
  • Events Events describe the start, end, and name of events occurring at a venue. Think of events as a pre-built search query for parking.

APIs allow to search for available parking at a specific location and time, but also to create a ParkWhiz reservation.

API

Mombo is a movie recommendations and ratings Web application  powered by a real-time sentiment analysis on Twitter and other social networks. Mombo provides an API that enables developers to access topical lists (in theaters, most popular, coming soon) of movies and specific information for individual movies in the Mombo.com database.

Metropolitan Area API

The WMATA API provides access to Washington Metropolitan Area Transit Authority transparent data sets, including information about:

  • rail and bus stations
  • rail and bus lines
  • bus positions
  • arrival time estimates
  • incident information (rail, elevators, bus)

The Future of Search

No comments

In a recent blog post, Jim Jansen discusses about the outcomes of an expert study, funded by , about the of search.

Jansen: future of search

The Future of Search

The study, which involved 54 search experts from U.S., EMEA and Asia, is available here; it contains the following major findings, all related with the goals:

  • Search will increasingly structure a wider range of data (social, UGC, video).
  • Users want more than links to sites and documents with matching key words. They expect the engine to figure out what they really need.
  • Aggregation of Data Silos: the search silo (index, history) will merge with other silos (social, location, purchase info, rich media, mobile apps) to give users what they want in fewer clicks.
    • For a movie, you might want to show info from different silos – expert reviews, friends comments, show times.
  • Search Engines must extract more value from social data, possibly using semantic/natural language, and cultural inference to understand what people are really seeking.
  • Search will be more and more ubiquitously enacted (PC and mobile).
  • Community participation will play a greater role in search quality, using the social graph as filter (as, for instance, in Blekko), or using UGC to augment search results.
  • Search engines will shift towards providing personalized results that leverage user history and context.
  • Search engines will become more conversational as they go back and forth with users to refine a user’s query.
  • Specialized vertical search sites are growing: users use vertical websites and apps to get faster, more relevant and more streamlined results. Mobile vertical apps are becoming increasingly popular because they reduce the number of clicks required.
  • Real-time search will grow, but not everything requires freshness: explosion of real time information via Facebook and Twitter will drive user demand for fresh and new data, but everything does not need to be fresh to be actionable and interesting.
    • Nice example: MIT Sensable Cities project allows users to access real time data streams captured via cell phones as they move in urban spaces http://senseable.mit.edu/.
  • There will be a growth of different UI paradigms for displaying information that is more relevant to the context of specific search query.
  • The results experience will move from text and links to visually rich results, surfacing, for example, maps, weather charts/tables and other visualization of data as part of the result.
  • Q&A will be important, expecially when obtained through community building.
We thank Alan Dix for suggesting the source of this blog post.
Powered by WordPress Web Design by SRS Solutions © 2012 Search Computing Blog Design by SRS Solutions
Rss Feed Tweeter button Facebook button Linkedin button Delicious button Digg button