The selection of the tokenization to be used depends on the problem rather than the data set . Let me explain with an example . Consider you are having a problem in which you want to get the action items from a transcript. Here first you have to split the transcript available into sentences so that you can classify them as action or non action. For this you use sentence tokenizer. After splitting into sentences you need to get the weightage of the words in the sentence so that you can classify each of sentences . For splitting sentence into words you use word tokenizer. After splitting into word now you get weightage of each word by using embeddings such as word2vec , glove ,tfidf.