1
Week 1: Write a python program to perform tokenization by word and
sentence using nltk.
Program for sentence tokenization:
import nltk
nltk.download('punkt') # Download the necessary tokenization models
from nltk.tokenize import sent_tokenize
def tokenize_sentences(text):
sentences = sent_tokenize(text)
return sentences
text = "NLTK is a leading platform for building Python programs to work with human language
data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet,
along with a suite of text processing libraries for classification, tokenization, stemming, tagging,
parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active
discussion forum."
sentences = tokenize_sentences(text)
for i, sentence in enumerate(sentences):
print(f"Sentence {i+1}: {sentence}")
import nltk
from nltk.tokenize import word_tokenize
word_tokenize('won’t')
:--Program for word Tokenization:
import nltk
nltk.download('punkt') # Download the necessary tokenization models
from nltk.tokenize import word_tokenize
def tokenize_words(text):
words = word_tokenize(text)
return words
text = "NLTK is a leading platform for building Python programs to work with human language
data."
words = tokenize_words(text)
print(words)
Comments
Post a Comment