In this paper proposes solutions for speaker diarization in TV talk shows for multimodal approaches. Both audio and video data can be taken for multimodal approach. In this paper can decomposes two levels such as, the reliable datasets can be formed for TV shows and SVM is used to classify the audio and video data based on unsupervised approach. Both audio and video data can be assembled by the association of visual and audio descriptors. Tabu search is introduced for improve the accuracy of the searching method. Once audio and visual features have been extracted, the system taking through collect the learning and classifying audiovisual frames based on SVM using Tabu search method. Time complexity will be reduced by using Tabu search. The result will produce the better output by using Tabu search algorithm. There are two schemes are measured for audio and video data such as audio-only classification scheme and parallel audio/visual classification scheme. The improvements of speaker diarization methods can be established effectively.