Browsing Posts in Structured Data Publishing

is promoting a very important initiative toward the democratization of research activities with big data. Such initiative, named Webscope, allows academic researchers to access a bunch of datasets, all of which  “reviewed to conform to Yahoo!’s data protection standards” on privacy.

Among the available datasets, great space is given to language and graph data, but some datasets also address important topics such as advertising, marketing and rating data.

More information about this initiative is available on the Webscope Website.

Prof. Zicari interviewed Dr. Alon Y. Halevy, head of the Structured Data Group at Research, on Google Fusion Tables and the importance of large scale data management tools.

The full transcript of the interview is available on the ODBMS.org Web site.

The Data Science Toolkit is a “collection of the best open data sets and open-source tools for data science, wrapped in an easy-to-use REST/JSON with command line, Python and Javascript interfaces”.

Examples of the services provided by the toolkit are:

  • Street Address to Coordinates conversion: calculates the latitude/longitude coordinates for a postal address.
    Currently restricted to the US and UK.
  • File to Text conversion: extracts text from PDFs, Word Documents, Excel Spreadsheets. It also recovers text from JPEG, PNG or TIFF images of scanned documents
  • Coordinates to political areas conversion: returns the country, region, state, county, constituencies and neighborhood a point is inside.
  • GeoDict:  it pulls country, city and region names from unstructured English text, and returns their coordinates.
  • IP Address to Coordinates conversion: it calculates country, state, city and latitude/longitude coordinates for IP addresses.

The toolkit also contains services for text analysis, such as the Text To People and the Text To Time services.

The latest version is marked as 0.35, and it has been released in April 17th 2011. The Data Science Toolkit was assembled by Pete Warden and the source code is available at http://github.com/petewarden/dstk

Researches from the Search Computing project attended the 2011 ACM SIGMOD Conference, which took place in Athens (Greece) on June 12-16.

A novel, live of the SeCo and environment has been presented at a dedicated booth.

DEMONSTRATION

Search Computing: Multi-domain Search on Ranked Data, authored by Alessandro Bozzon, Daniele Braga, Marco Brambilla, Stefano Ceri, Francesco Corcoglioniti, Piero Fraternali, Salvatore Vadacca

continue reading…

Researches from the Search Computing project attended the 2oth International World Wide Web Conference (WWW 2011) which took place in Hyderabad (India) from March 28th to April 1st.

A novel, live of the Liquid Query search interaction paradigm has been presented at a dedicated booth.

DEMONSTRATION

Exploratory search in multi-domain information spaces with Liquid Query, authored by Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Piero Fraternali, and Salvatore Vadacca.

continue reading…

The ParkWhiz API provides developer with an access to the ParkWhiz’s real-time parking and event data in major US cities, airports, venues, and events. The ParkWhiz deals with 4 types of objects:

  • Parking locations: a specific geographic location; it contains the description of the physical location of a parking spot.
  • Parking listings: pricing and availability information for a parking location.
  • Venues: points of interest, such as theatres, stadiums, or airports. A venue object describes the geographic location of the venue, as well as any events occurring at that venue.
  • Events Events describe the start, end, and name of events occurring at a venue. Think of events as a pre-built search query for parking.

APIs allow to search for available parking at a specific location and time, but also to create a ParkWhiz reservation.

API

Mombo is a movie recommendations and ratings Web application  powered by a real-time sentiment analysis on Twitter and other social networks. Mombo provides an API that enables developers to access topical lists (in theaters, most popular, coming soon) of movies and specific information for individual movies in the Mombo.com database.

Metropolitan Area API

The WMATA API provides access to Washington Metropolitan Area Transit Authority transparent data sets, including information about:

  • rail and bus stations
  • rail and bus lines
  • bus positions
  • arrival time estimates
  • incident information (rail, elevators, bus)

Google Shopping API

No comments
Google Shopping API Logo

Shopping

Google announced the release of the Shopping API, a new set of Web Application Programming Interfaces that are meant to substitute the existing Google Base APIs. The new Shopping Application Programming Interfaces (APIs) have two main components: Content and Search. Those components are part of a unique CRUD infrustructure for product data management.

On one hand, the Content API enables retailers to upload their product data to Google, and to make incremental updates to frequently changing attributes like price and availability.

On the other hand, the Search API provides access to product data. After creating a new project in the APIs console, a developer can issue JSON queries as the following one:

https://www.googleapis.com/shopping/search/v1/public/products?key=key&country=US&q=digital+camera&alt=atom

This query will return a feed pf products sold in the United States which are all matching the keywords digital and camera. With a registered account, the new Google Shopping API feature a default limit: 2,500 queries/day

The API supports both structured and free text search. Results can be ordered according to relevance, novelty, or price. It is possible to increase diversity in the set of products matching a query by using the APIs crowding mechanism to restrict the number of products with an equivalent property.

The Google Base API will be fully deactivated on June 1, 2011. Some non-shopping data types (such as jobs, real estate, events, and activities) won’t be supported anymore.

Researches from the Search Computing project attended the 8th International Conference on Service Oriented Computing, which took place in San Francisco from December 7 to December 10 2010.

Two works related to Search Computing were presented: a of the SeCo , and a research paper about the SeCo architecture.

Demonstration

Panta Rhei: Optimized and Ranked Data Processing over Heterogeneous Sources authored by Daniele Braga, Francesco Corcoglioniti, Michael Grossniklaus and Salvatore Vadacca.

 

Salvatore Vadacca presenting the SeCo demonstration

Salvatore Vadacca presenting the SeCo demonstration

 

In the era of digital information, the value of data resides not only in its volume and quality, but also in the additional information that can be inferred from the combination (aggregation, comparison and join) of such data. There is a concrete need for data processing solutions that combine distributed and heterogeneous data sources, such as Web services, relational databases, and even search engines, that can all be modeled as services. In this demonstration, we show how our Panta Rhei model addresses the challenge of processing data over heterogeneous sources to provide feasible and ranked combinations of these services.

Research Paper

A Service-Based Architecture for Multi-domain Search on the Web authored by Alessandro Bozzon, Marco Brambilla, Francesco Corcoglioniti, and Salvatore Vadacca.

Search engines are exploiting more and more named entity identification in the query analysis phase.
Besides increasing the precision of the results, this enables the generation of result pages more suitable to the typical needs of users with respect to the identified entities.

The attached document considers queries returning mono-domain results, where the domain represents a specific field of interests such as City, People, Movies, etc. and characterizes the problem for the definition of the layout of such results. In particular, it analyzes the behaviour of the main current search engines (, Bing, Yahoo) according to the result page layout definition issue.

search engines: page layout analysis

The final objective is to describe a conceptual definition of the Web search result layout problem, by identifying: the parameters involved in the layout design, the tuning dimensions available for optimizing the result layout, and the possible strategies that can be adopted for producing such layouts.

This has been the topic of a position paper at the DataView workshop within the OTM conference 2010, Crete, Grece.

A nice overview on is available in this very interesting work by Geoff McGhee from Stanford University, comprising interviews, summaries, insights and visions:
Journalism in the Age of Data: A Video Report on Data Visualization.

Data visualization

http://datajournalism.stanford.edu/

Powered by WordPress Web Design by SRS Solutions © 2012 Search Computing Blog Design by SRS Solutions
Rss Feed Tweeter button Facebook button Linkedin button Delicious button Digg button