Some of our products are available for evaluation. Please contact us to inquire.

Contact Sales:
+1 646-206-6014
info@sematext.com

Products :: Language Identifier

aka. LangID, Language Guesser, Language Detector

Language Identifier detects the language used in a piece of text (e.g. document, email, web page). It uses statistical NLP (Natural Language Processing) to learn about languages and to identify them.

Business Value / Benefits

Do You Need It?

How do you determine if Language Identifier is for you?

Integration

Language Identifier exposes a simple Java API. Given a piece of text it returns a list of languages ordered by confidence score. It seamlessly integrates with Lucene and Solr, but is not tied to search and can be used in applications that have nothing to do with search. It also runs as a REST/Web service, thus allowing integration with any software component that can invoke it over HTTP.

FAQ

Q: Which languages can Language Identifier recognize?
A: It can detect any language it has been trained for, regardless of type of character set used, encoding, etc.
Q: How accurate is the Language Identifier?
A: Accuracy depends on the quality and size of the training set.

See also