Sei sulla pagina 1di 4

7/19/2017 5 Heroic Python NLP Libraries

Free 7-Day Crash Course Blog Masterclass

5 Heroic Python NLP Libraries


October 29, 2016

48 Share Google Linkedin Tweet


SHARES

Natural language processing (NLP) is an exciting field in data science and artificial
intelligence that deals with teaching computers how to extract meaning from text. In this
guide, well be touring the essential stack of Python NLP libraries.

These packages handle a wide range of tasks such as part-of-speech (POS) tagging,
sentiment analysis, document classification, topic modeling, and much more. Recommended Reading

The Beginners Guide to Kaggle

How to Handle Imbalanced Classes in


Machine Learning

9 Mistakes to Avoid When Starting Your


Career in Data Science

WTF is the Bias-Variance Tradeoff?


(Infographic)

Free Data Science Resources for Beginners

Dimensionality Reduction Algorithms:


Strengths and Weaknesses

Modern Machine Learning Algorithms:


Strengths and Weaknesses

Why only 5 libraries?

We write every guide with the practitioner in mind. There are dozens of packages
for NLP out there... but you'll cover all the important bases once you master a handful of
them. This is an opinionated guide that features the 5 Python NLP libraries we've found
to be the most useful.

Do I need to learn every library below?

No, it all depends on your use case. Here's a summary:

We recommend NLTK only as an education and research tool. Its modularized


structure makes it excellent for learning and exploring NLP concepts, but it's not
meant for production.
TextBlob is built on top of NLTK, and it's more easily-accessible. This is our favorite
library for fast-prototyping or building applications that don't require highly optimized
performance. Beginners should start here.
Free 7-Day Crash Course:
Stanford's CoreNLP is a Java library with Python wrappers. It's in many
existing production systems due to its speed.
Data Science & Machine
Learning!
SpaCy is a new NLP library that's designed to be fast, streamlined, and production-
ready. It's not as widely adopted, but if you're building a new application, you should
Jumpstart your data science and machine learning
give it a try. journey with our practical, super intuitive course.
Gensim is most commonly used for topic modeling and similarity detection. It's not a
general-purpose NLP library, but for the tasks it does handle, it does them well. First Name

Why are they heroic?


Email Address

Because they are valiant! So without further ado...


Sign Up Now
The Conqueror: NLTK

https://elitedatascience.com/python-nlp-libraries 1/4
7/19/2017 5 Heroic Python NLP Libraries

The Prince: TextBlob


The Mercenary: Stanford CoreNLP
The Usurper: spaCy
The Admiral: gensim

The Conqueror: NLTK


You can't talk about NLP in Python without mentioning NLTK. It's the most
famous Python NLP library, and it's led to incredible breakthroughs in the field. NLTK is
responsible for conquering many text analysis problems, and for that we pay homage.

NLTK is also popular for education and research. On its own website, NLTK claims to be
an "an amazing library to play with natural language."

In our experience, the key word there is "play." NLTK has over 50 corpora and lexicons,
9 stemmers, and dozens of algorithms to choose from. It's an academic researcher's
theme-park.

Yet, this is also one of NLTK's major downsides. It's heavy and slippery, and it has a
steep learning curve. The second major weakness is that it's slow and not production-
ready.

The next 3 libraries will address these weaknesses.

Resources

NLTK Book - Complete course on Natural Language Processing in Python with


NLTK.
Dive into NLTK - Detailed 8-part tutorial on using NLTK for text processing.

The Prince: TextBlob Free 7-Day Crash Course: Data Science


& Machine Learning!
TextBlob sits on the mighty shoulders of NLTK and another package called Pattern. In
fact, we left out Pattern from this list because we recommend TextBlob instead.

TextBlob makes text processing simple by providing an intuitive interface to NLTK. It's a
welcome addition to an already solid lineup of Python NLP libraries because it has a
gentle learning curve while boasting a surprising amount of functionality.

For example, let's say you wanted to find a text's sentiment score. You can do that out of
the box:

Python
from textblob import TextBlob
opinion = TextBlob("EliteDataScience.com is dope.")
opinion.sentiment

Free
By default, the sentiment analyzer is the PatternAnalyzer from the Pattern library. But 7-Day Crash Course:
what if you wanted to use a Naive Bayes analyzer? You can easily swap to a pre-trained
Data Science & Machine
implementation from the NLTK library.
Learning!
Python
Jumpstart your data science and machine learning
from textblob import TextBlob
journey with our practical, super intuitive course.
from textblob.sentiments import NaiveBayesAnalyzer
opinion = TextBlob("EliteDataScience.com is dope!", analyzer=NaiveBayesAnalyzer(
opinion.sentiment First Name

TextBlob is a simple, fun library that makes text analysis a joy. We'll at least use TextBlob
Email Address
for initial prototyping for almost every NLP project.

Resources Sign Up Now

https://elitedatascience.com/python-nlp-libraries 2/4
7/19/2017 5 Heroic Python NLP Libraries

TextBlob Documentation - Official documentation and quickstart guide.


Natural Language Processing Basics with TextBlob - Excellent, short NLP crash
course using TextBlob.

The Mercenary: Stanford CoreNLP


Stanford CoreNLP is a suite of production-ready natural analysis tools. It includes part-
of-speech (POS) tagging, entity recognition, pattern learning, parsing, and much more.

"The Mercenary" is actually written in Java, not Python. You can get around this with
Python wrappers made by the community.

Many organizations use CoreNLP for production implementations. It's fast, accurate, and
able to support several major languages.

Resources

CoreNLP Documentation - Official documentation and resource compilation.


List of Python wrappers for CoreNLP - Kept up-to-date by Stanford NLP.

The Usurper: spaCy


SpaCy is the new kid on the block, and it's making quite a splash. It's marketed as an
"industrial-strength" Python NLP library that's geared toward performance.

SpaCy is minimal and opinionated, and it doesn't flood you with options like NLTK does.
Its philosophy is to only present one algorithm (the best one) for each purpose. You don't
have to make choices, and you can focus on being productive.


Because it's built on Cython, it's also lightning-fast. Folks have called spaCy "state-of-
the-art,"
and it's hard to disagree. Its main weakness is that it currently only supports
English. Free 7-Day Crash Course: Data Science
& Machine Learning!
SpaCy is newer, so its support community is not as large as some other libraries'. Yet, its

approach to NLP is so compelling that it could possible dethrone NLTK.


If you're building a new application or revamping an old one (and you only need English
support), then we strongly recommend trying spaCy.

Resources

<<spaCy Documentation - Official documentation and quickstart guide.
Intro to NLP with SpaCy - Short tutorial showcasing spaCy's functionality.

The Admiral: gensim


Last but not least, we have gensim. Gensim is not for all challenges, but what it does do,
Free
it does them well. You don't send your admiral to a land battle, and you don't use gensim7-Day Crash Course:
for general NLP. Data Science & Machine
Learning!
Gensim is a well-optimized library for topic modeling and document similarity analysis.
Among the Python NLP libraries listed here, it's the most specialized. Jumpstart your data science and machine learning
journey with our practical, super intuitive course.
Even so, it's a valuable tool to add to your repertoire. Its topic modeling algorithms, such
as its Latent Dirichlet Allocation (LDA) implementation, are best-in-class. In addition, it's
First Name
robust, efficient, and scalable.

Plus, the sub-field semantics analysis (or topic modeling), is one of the most exciting Email Address
areas of modern natural language processing.
Sign Up Now
Resources

https://elitedatascience.com/python-nlp-libraries 3/4
7/19/2017 5 Heroic Python NLP Libraries

gensim Documentation - Official documentation and tutorials. The tutorials page is


very helpful.

48 Share Google Linkedin Tweet


SHARES

Comments are closed.

Copyright 2017 EliteDataScience.com All Rights Reserved Home Terms of Service Privacy Policy

Free 7-Day Crash Course: Data Science


& Machine Learning!

Free 7-Day Crash Course:


Data Science & Machine
Learning!
Jumpstart your data science and machine learning
journey with our practical, super intuitive course.

First Name

Email Address

Sign Up Now

https://elitedatascience.com/python-nlp-libraries 4/4

Potrebbero piacerti anche