aka. Concept Extractor, Collocation Extractor, SIP Extractor
Description
Key Phrase Extractor is a toolkit for extracting key terms
and phrases from text. It is designed to be used in two
main modes:
- Extractor of common (frequently occurring) phrases. These phrases
are known as Collocations.
When used in this mode, the Key Phrase Extractor identifies key
phrases in the input text. For example, if Key Phrase Extractor were
to analyze the content
of Lucene
in Action, it would find terms like "Lucene" and "search", as well
as phrases such as "inverted index", "information retrieval", "query
parser", and so on.
- Extractor of phrases based on the comparison and the difference
between phrases found in two sets of documents (also known as
background and foreground corpus). These phrases are known
as Statistically Improbable Phrases or SIPs.
When used in this mode, the Key Phrase Extractor finds key
differentiating phrases between two document sets. For example, when
given news articles from yesterday and news articles from today, the
Key Phrase Extractor will identify key terms and phrases that make
today's news different from yesterday's. Key terms and phrases may end
up being names of people such as "Steve Jobs" or "Warren Buffett", as
well as phrases such as "Swine Flu" or "Somali Pirates", thus
identifying people and concepts that have more mentions today than
they were yesterday.
Business Value / Benefits
- Extracts key concepts from content
- Extracts key concepts from multiple pieces of content based on content difference
- Identifies key terms and phrases useful for describing main concepts from a larger piece of text
- Finds key terms and phrases for search results enhancement by providing additional navigational meta-data
Integration
Key Phrase Extractor exposes a simple Java API. Given a piece of text
it returns a list of phrases ordered by their computed score. The API
includes the ability to filter out the returned phrases. The toolkit
includes several useful filters. The extensible and very simple API
lets you write and plug in your own filters, too.
FAQ
None - ask us!
|
Try Demo
|