
With the dramatic increase of digital form of data, it is difficult to manage the huge amount of documents. Whenever any individual tries to find any information about particular topic, he may receive a large set of documents on the internet. Some of these documents may be in .pdf format some may be in .txt format or simply any word document. The title of these documents may seem relevant to what the individual is looking for but the content in those documents may differ. Thus there was a necessity to read, understand and analyze contents of all the documents at one glance. As a result, it has become necessary to categorize large texts (documents) into specific classes. In our propose system we are classifying the documents, both single and multiple documents into predefined classes. The documents can be of any form i.e .txt, .doc, .docx, .pdf. Then preprocessing techniques are used like tokenization, stop words removal, stemming on input files. The document is classified according to the given learning. Dynamic learning is used to update the learning datasets. This project covers how the classification of document is done and how exactly the desired output is determined (classified documents). We also aimed at generating a classification report of number of documents in a particular class with respect to total number of documents. The pie chart can also showcase why a particular document is inclined towards any particular category and what percentage of its content consists of related information towards that category.