Browsing Posts tagged Natural Language Processing

The Data Science Toolkit is a “collection of the best open data sets and open-source tools for data science, wrapped in an easy-to-use REST/JSON with command line, Python and Javascript interfaces”.

Examples of the services provided by the toolkit are:

  • Street Address to Coordinates conversion: calculates the latitude/longitude coordinates for a postal address.
    Currently restricted to the US and UK.
  • File to Text conversion: extracts text from PDFs, Word Documents, Excel Spreadsheets. It also recovers text from JPEG, PNG or TIFF images of scanned documents
  • Coordinates to political areas conversion: returns the country, region, state, county, constituencies and neighborhood a point is inside.
  • GeoDict:  it pulls country, city and region names from unstructured English text, and returns their coordinates.
  • IP Address to Coordinates conversion: it calculates country, state, city and latitude/longitude coordinates for IP addresses.

The toolkit also contains services for text analysis, such as the Text To People and the Text To Time services.

The latest version is marked as 0.35, and it has been released in April 17th 2011. The Data Science Toolkit was assembled by Pete Warden and the source code is available at http://github.com/petewarden/dstk

is a company based in San Francisco, California that is developing a natural language search engine for the Internet.

Powerset is working on building a natural language search engine that can find targeted answers to user questions (as opposed to keyword based search). For example, when confronted with a question of the form ‘which U.S. state has the highest income tax?’, conventional search engines ignore the question and instead do a search on the keywords ‘state, income and tax’. Powerset’s product, on the other hand, attempts to use to understand the nature of the question and then to search and return a subset of the web that contains the answer to the question. If it works, results from Powerset’s search engine would have a higher relevance than results from a keyword search engine. From a commercial standpoint, advertising on the results page could also be more relevant and could have a higher revenue potential than that of keyword search engines.

Currently, the company is in the process of “building a natural language search engine that reads and understands every sentence on the Web.” The company has licensed natural language technology from PARC, the former Xerox Palo Alto Research Center.

On May 11, 2008, the company unveiled a tool for searching a fixed subset of Wikipedia using conversational phrases rather than keywords.

On July 1, 2008, Microsoft signed an agreement to acquire Powerset

[Source Wikipedia http://en.wikipedia.org/wiki/Powerset_(company) ]

[Website http://www.powerset.com/]

Powered by WordPress Web Design by SRS Solutions © 2012 Search Computing Blog Design by SRS Solutions
Rss Feed Tweeter button Facebook button Linkedin button Delicious button Digg button