Browsing Posts in Data Visualization

In the model proposed by Kuhlthau in 1991, the following phases are envisioned in the Information Seeking Process:

  • Initialization marks introduction of a problem, including the problem definition and, based on this, the provision of suggested solutions based on previously operated processes.
  • During the second phase, Selection, users identify the general area for investigation though an overview of all available topics, together with automatic topic suggestions.
  • The third stage, Exploration, is known as the most difficult of the entire process, as it includes  the exploration of the general topic(s) in order to extend understanding and to relate it with what is already known: it requires a graphical representation of information that can be understood by the user, interaction possibilities  that are self-explanatory and easy to use, the accessibility of details on demand, sorting and paging  techniques to handle large datasets,  functionalities capable of showing information in different levels of detail,  grouping and clustering  operations to highlight a specific data dimension, and suggestions on how to expand the current set of topics with related ones.
  • The fourth stage, Formulation, is the phase in which a focused perspective on the topic emerges, by selecting hypotheses through filtering or information re-shaping. It requires the interactive and intuitive formulation and change of filters, the combination of different filters, the traceability of effects caused by each of them and a possibility to change the focus (pivoting).
  • The Collection task is used to gather information and requires easy mechanisms to select interesting findings and to export the selected information for further use in other systems. Finally, the presentation task is devoted to presenting the collected information, by providing different opportunities to visualize the results.

In our recent study we evaluated the exploration tools based on the coverage of these phases.

We classified the tools in four categories:

Linked Data Search Engines are devoted to crawling and indexing all Linked Data published on the Web, implementing effective ranking algorithms, and exposing search services to machines and humans, but with no specific focus on supporting information seeking processes. The first two Linked Data search engines to appear were Swoogleand Watson. The other attempts differentiate on extending or improving one of the three basic features above. Sindice is a remarkable example of modern LD search engine, as it focuses on scaling to very large quantities of data and on supporting simple reasoning mechanisms for inverse-functional properties.

Linked Data Browsers were first demonstrated in Tabulator. Using outline and table modes, Tabulator provides a way to browse RDF data published on the Web. Disco is a lighter version of Tabulator, meant for debugging sites that publish LD. Another notable example of LD Browser is Marble, which aims instead to format linked data content for XHTML clients using Fresnel lenses and formats, including colors for providing provenance information. The first industrial result is OpenLink Data Explorer, which allows Web users to explore the linked data that may underlie a Web page. LD browsers are naturally inclined to triple exploration, with no support for search, filtering, data re-shaping or alternative visualizations; the only exception in the group is represented by Tabulator, which, thanks to its collapsible hierarchical visualization, it enables custom exploration.

Visual Interfaces for Linked Data Repositories allow users to navigate Linked Data repositories, both by means of a SPARQL endpoint and a rich visual interface. This aspect is relevant since many academic and industrial research labs claim to able to keep updated in real-time, by mirroring changes from the original data sources, a repository containing all (or part of) the datasets published as Linked Data. RDK Explorer, Sig.ma, Haystack  and Uberblic are few popular examples of these visual interfaces for Linked Data Repositories.
RDK Explorer offers a panel-based interface to heterogeneous large sets of information about people, publications, research topics and projects. The interface displays detailed information regarding the current resource, a graph that shows the context of related resources (permitting the user to change the current resource). VisiNav is a system based on a visual query construct paradigm exploiting six atomic query operations such as keyword search, facet selection, path traversal, etc.  Sig.ma offers an interface where the displayed links are collected from multiple Linked Data sources and merged. A key distinguish feature of Sig.ma is the incremental display of data while relevant sources are discovered, thus enhancing the user experience; users can also highlight data provenance and favor (or discard) given data sources. Haystack aggregates Linked Data from multiple arbitrary locations and presents it to the user in a human-readable fashion, with point and click semantics that let the user navigate from one piece of data to another. Display is controlled by presentation recommendations, i.e., sort of stylesheets that can be used for obtaining multiple views (e.g., as thumbnails, Web pages, or taxonomies).
Uberblic is an  industrial research result that provides an integration service tying together all that Linked Data into a more coherent experience. Other solutions, like Microsoft Pivot, have been adapted to Linked Data browsing too, thus letting the user explore results by zooming, panning, or pivoting. Our proposal is located in the same application space as the above mentioned tools, sharing several key features such as native support for incremental exploration, , data relationships highlighting and navigation. However, only RDK Explorer provides support for the initialization and selection steps, whereby only Pivot and VisiNav support pivoting.

Facet-based systems also share several characteristics with our approach, as they try to build semantically unique search queries by enabling faceted search through facets and results navigation. In Facet Graphs, facets and result set are represented as nodes in a graph visualization. The semantic relations that exist between facets and result set as well as facets and other facets are represented by labeled directed edges between the nodes. Other tools like mSpace, Humboldt and Parallax also allow for hierarchical filtering. Parallax also provide support for expansion with related topics, where the available relationships are the ones pre-defined in the underlying collection.

Comparison.

Linked data exploration tools. Coverage of ISP phases

As it emerges from the Table above, the based Liquid Query approach is the one that currently provides the widest coverage of the information seeking process requirements. Differently from all the analyzed solutions, our approach is not aimed at producing one general : thanks to application configurations, we provide a methodology to configure ad-hoc vertical solutions for navigating data within specific domains. Our join-based approach saves the user several exploratory link navigations between concepts and our tunable global ranking function provides a customizable ranking of combinations of objects. Furthermore, in our work exploration is not confined to data aggregated in one repository, but, thanks to value-based joins, can span linked data and arbitrary data sources wrapped as Web services. Solution and topic suggestions are not currently covered, but they can be obtained by mining user behavior (studies on these aspects are part of our future agenda). Traceability and exporting facilities are not currently implemented in our prototypes, but the approach is ready to support them and we plan to close their implementation in the near future. The online demo covers the stages of Exploration, Selection, Collection, and Presentation of results, while Initialization and Selection are covered by the configuration of the application, that is performed through specifically devised design tools.

An amazing video from Hans Rosling.

This video is enabled by Trendalyzer, an information visualization software for animation of statistics that was initially developed by Hans Rosling’s Gapminder Foundation and then was acquired by Google Inc in 2007. Some components, such as the Flash-based Motion Chart gadget, have become available for public use as part of the Google Visualizations API.

This kind of applications open several opportunities for SeCo as a data generator for dynamic, animated storytelling upon Web data.

A few examples of solutions for complex / multidomain web data:

http://www.geovista.psu.edu/grants/cdcesda/software/

http://www.thegardnerteam.net/
(new technology on real estate domain)

Another interesting application is this realtime visualizer of the London underground status:

London underground mashup

Search engines are exploiting more and more named entity identification in the query analysis phase.
Besides increasing the precision of the results, this enables the generation of result pages more suitable to the typical needs of users with respect to the identified entities.

The attached document considers queries returning mono-domain results, where the domain represents a specific field of interests such as City, People, Movies, etc. and characterizes the problem for the definition of the layout of such results. In particular, it analyzes the behaviour of the main current search engines (Google, Bing, Yahoo) according to the result page layout definition issue.

search engines: page layout analysis

The final objective is to describe a conceptual definition of the Web search result layout problem, by identifying: the parameters involved in the layout design, the tuning dimensions available for optimizing the result layout, and the possible strategies that can be adopted for producing such layouts.

This has been the topic of a position paper at the DataView workshop within the OTM conference 2010, Crete, Grece.

Choosel is extremely relevant for because it allows one to effectively visualize complex data objects. For instance, a combination of widgets can be defined for showing concepts according to geographical and temporal dimensions altogether. See this demo:

A nice overview on is available in this very interesting work by Geoff McGhee from Stanford University, comprising interviews, summaries, insights and visions:
Journalism in the Age of Data: A Video Report on Data Visualization.

Data visualization

http://datajournalism.stanford.edu/

The new Web site is online!

The Search Computing Web site is the main source of information for the Search Computing project, and it  features:

  • A brand new section dedicated to demonstrators, where you canfind several demonstration videos of the prototypes developed within the project, plus access to some live demonstrators
  • Slides, pictures and audio footage of the second Search Computing Workshop, held in Milano and Como, Italy, on May 25-31, 2010. The workshop targeted several “hot topics” of the project, with roughly 50 participants (50% invited and 50% SeCo), including Ricardo Baeza-Yates from Yahoo!, Paolo Boldi, Gabriella Pasi, Roberto Verganti, Tommaso Buganza, Sonia Bergamaschi, Laura Po, Francesco Guerra, and Domenico Beneventano, Fabian Suchanek, Georg Gottlob, Sergio Flesca, Florian Daniel, Fabio Casati, Imran Muhammad, Dana Florescu, Donald Kossmann, Norman Paton, Neoklis Polyzotis, Ihab F. Ilyas, Frank Valentin, Paolo Missier, Angela Bachi, Paolo Romano, Luciano Milanesi, Marta Corubolo, etc.
  • A whole section dedicated to the book “Search Computing Challenges and Directions”, edited by Stefano Ceri, Marco Brambilla (Springer LNCS, Vol. 5950, March 2010).
  • Theses and open positions within the project, plus a lot of additional material like the slides of the MS course on Search Computing at Politecnico di Milano, publications and so on.

Check it out at: www.search-computing.org

ACM Queue has just published an interesting survey by Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky on advanced visualization techniques.

continue reading…

Mechanical Turk (Mturk) is a Web service where users, turkers, are paid small rewards (few cents) for short computational task called HITs (Human Intelligence Tasks). A contractor generates the HITs, post them on Mturk and later download all the result.

TurKit is a /JavaScript (developed by the Design Group at MIT) for running iterative tasks on Mechanical Turk. As of today, TurKit represents the first example of iterative tasks framework for Mturk, as it allows users to perform incremental tasks by automatically generating HITs based on the results of previous HITs.

Many applications can benefit from this iterative paradigm: turkers can take turns improving a passage of text, verify each other’s work by voting on it or implement the comparison function of an iterative sorting algorithm. In the context of SeCo, turkers can be employed, for instance,  to evaluate the quality of a query response.

continue reading…

Via O’Reilly Radar:

ToxicLibsan independent, open source library collection for computational design tasks with & Processing.

The library, programmed in Java, contains something like 130+ classes devoted to computational design which, for the purposes of , might translate into and interaction.

The library features packages for audio, color, geometries, and physic effects management. A set of demo applications is hosted on openprocessing.org.

Powered by WordPress Web Design by SRS Solutions © 2012 Search Computing Blog Design by SRS Solutions
Rss Feed Tweeter button Facebook button Linkedin button Delicious button Digg button