1

Week 1: Write a python program to perform tokenization by word and

sentence using nltk.

Program for sentence tokenization:

import nltk

nltk.download('punkt') # Download the necessary tokenization models

from nltk.tokenize import sent_tokenize

def tokenize_sentences(text):

sentences = sent_tokenize(text)

return sentences

text = "NLTK is a leading platform for building Python programs to work with human language

data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet,

along with a suite of text processing libraries for classification, tokenization, stemming, tagging,

parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active

discussion forum."

sentences = tokenize_sentences(text)

for i, sentence in enumerate(sentences):

print(f"Sentence {i+1}: {sentence}")

import nltk

from nltk.tokenize import word_tokenize

word_tokenize('won’t')

:--Program for word Tokenization:

import nltk

nltk.download('punkt') # Download the necessary tokenization models

from nltk.tokenize import word_tokenize

def tokenize_words(text):

words = word_tokenize(text)

return words

text = "NLTK is a leading platform for building Python programs to work with human language

data."

words = tokenize_words(text)

print(words)

mona