Sometimes, we need the starting index of a word and the last index of that word. Sentences consist of words separated by spaces. In this Python article, two different ways of finding the beginning and end indices of all words in a sentence or a given string are given using two different examples. In the first example, follow the process of simply iterating over all characters of the string while looking for spaces that mark the beginning of a word. In Example 2, the Natural Language Toolkit is used to find the start and end indices of all words in a string.
Step 1 - First get a string and name it as given Str.
Step 2 - Create a function called StartandEndIndex that will take this given Str and iterate over it, check for whitespace and return a list of tuples with the start and end indices of all words .
Step 3 - Create a word list using the split method.
Step 4 - Use the values from the two lists above and create a dictionary.
Step 5 - Run the program and check the results.
#function for given word indices def StartandEndIndex(givenStr): indexList = [] startNum = 0 lengthOfSentence=len(givenStr) #iterate though the given string for indexitem in range(0,lengthOfSentence): #check if there is a separate word if givenStr[indexitem] == " ": indexList.append((startNum, indexitem - 1)) indexitem += 1 startNum = indexitem if startNum != len(givenStr): indexList.append((startNum, len(givenStr) - 1)) return indexList givenStr = 'Keep your face always toward the sunshine and shadows will fall behind you' #call the function StartandEndIndex(givenStr) #and get the list having starting and ending indices of all words indexListt = StartandEndIndex(givenStr) # make a list of words separately listofwords= givenStr.split() print("\nThe given String or Sentence is ") print(givenStr) print("\nThe list of words is ") print(listofwords) #make a dictionary using words and their indices resDict = {listofwords[indx]: indexListt[indx] for indx in range(len(listofwords))} print("\nWords and their indices : " + str(resDict))
To see the results, run the Python file in a cmd window.
The given String or Sentence is Keep your face always toward the sunshine and shadows will fall behind you The list of words is ['Keep', 'your', 'face', 'always', 'toward', 'the', 'sunshine', 'and', 'shadows', 'will', 'fall', 'behind', 'you'] Words and their indices : {'Keep': (0, 3), 'your': (5, 8), 'face': (10, 13), 'always': (15, 20), 'toward': (22, 27), 'the': (29, 31), 'sunshine': (33, 40), 'and': (42, 44), 'shadows': (46, 52), 'will': (54, 57), 'fall': (59, 62), 'behind': (64, 69), 'you': (71, 73)}
Figure 1: Displaying results in the command window.
Step 1 - First install nltk using the pip command. Now import align_tokens from it.
Step 2 - Take the given Str as test string and split it into words using split function and call it listofwords.
Step 3 - Now use align_tokens and listofwords as tokens along with the given Str.
Step 4 - It will return the word index list but with spaces. Subtract one from the last word index value to get a word index list without spaces.
Step 5 - Use the values from the two lists above and create a dictionary.
Step 6 - Run the program and check the results.
#Use pip install nltk to install this library #import align tokens from nltk.tokenize.util import align_tokens #specify a string for testing givenStr = 'Keep your face always toward the sunshine and shadows will fall behind you' #make a list of words listofwords= givenStr.split() print("\nThe given String or Sentence is ") print(givenStr) print("\nThe list of words is ") print(listofwords) #this will include blank spaces with words while giving indices indices_includingspace= align_tokens(listofwords, givenStr) indices_withoutspace=[] #reduce the last index number of the word indices for item in indices_includingspace: #convert tuple to list lst = list(item) lst[1]=lst[1] - 1 #convert list to tuple again tup = tuple(lst) indices_withoutspace.append(tup) print(indices_withoutspace) #make the dictionary of all words in a string with their indices resDict = {listofwords[indx]: indices_withoutspace[indx] for indx in range(len(listofwords))} print("\nWords and their indices : " + str(resDict))
Open the cmd window and run the python file to view the results.
The given String or Sentence is Keep your face always toward the sunshine and shadows will fall behind you The list of words is ['Keep', 'your', 'face', 'always', 'toward', 'the', 'sunshine', 'and', 'shadows', 'will', 'fall', 'behind', 'you'] [(0, 3), (5, 8), (10, 13), (15, 20), (22, 27), (29, 31), (33, 40), (42, 44), (46, 52), (54, 57), (59, 62), (64, 69), (71, 73)] Words and their indices : {'Keep': (0, 3), 'your': (5, 8), 'face': (10, 13), 'always': (15, 20), 'toward': (22, 27), 'the': (29, 31), 'sunshine': (33, 40), 'and': (42, 44), 'shadows': (46, 52), 'will': (54, 57), 'fall': (59, 62), 'behind': (64, 69), 'you': (71, 73)}
Figure 2: Displaying words and their indexes.
In this Python article, using two different examples, methods are given to find the starting index and ending index of all words in a string. In Example 1, this is accomplished by iterating over all characters of the string. Here, spaces are chosen to mark the beginning of new words. In Example 2, the nltk library or Natural Language Toolkit is used. First, it is installed using pip. Then import the required module named align_tokens. Using this module and specifying the tags in the word list, the index of all words can be found.
The above is the detailed content of Python Program: Find the starting and ending index of all words in a string. For more information, please follow other related articles on the PHP Chinese website!