Tree Mining

Image

Tree Mining

Structure mining or structured data mining is the process of finding and extracting useful information from semi-structured data sets. Graph mining, sequential pattern mining and molecule mining are special cases of structured data mining[citation needed].

The growth of the use of semi-structured data has created new opportunities for data mining, which has traditionally been concerned with tabular data sets, reflecting the strong association between data mining and relational databases. Much of the world's interesting and mineable data does not easily fold into relational databases, though a generation of software engineers have been trained to believe this was the only way to handle data, and data mining algorithms have generally been developed only to cope with tabular data.

XML, being the most frequent way of representing semi-structured data, is able to represent both tabular data and arbitrary trees. Any particular representation of data to be exchanged between two applications in XML is normally described by a schema often written in XSD. Practical examples of such schemata, for instance NewsML, are normally very sophisticated, containing multiple optional subtrees, used for representing special case data. Frequently around 90% of a schema is concerned with the definition of these optional data items and sub-trees.

Messages and data, therefore, that are transmitted or encoded using XML and that conform to the same schema are liable to contain very different data depending on what is being transmitted.

Such data presents large problems for conventional data mining. Two messages that conform to the same schema may have little data in common. Building a training set from such data means that if one were to try to format it as tabular data for conventional data mining, large sections of the tables would or could be empty.

There is a tacit assumption made in the design of most data mining algorithms that the data presented will be complete. The other necessity is that the actual mining algorithms employed, whether supervised or unsupervised, must be able to handle sparse data.

Chemical Informatics is Insight medical publisher journal and also one of the most emerging fields in the present scenario. It is a multidisciplinary field which covers the research containing molecular design tools for finding the best fitting compounds which address to particular targets.

Chemical Informatics is a vast field that aims to disseminate information regarding the design, structures, creation, dissemination, visualization and the use of chemical information. Chemical Informatics Journal aims to supply scientists of resources in order to provide the scientific knowledge through the publication of peer-reviewed, high quality, scientific papers and other material on all topics related to Chemical information, Software and databases.

Submission

Article submissions should be done using the online Editor Tracking System or through E-mail IDs provided at the respective journal’s site.

Submit manuscript to http://www.imedpub.com/submissions/chemical-informatics.html or as an E-mail attachment to our editorial office at chemicalinformatics@chemistryjournals.org

 

Contact

Elsa
Journal Manager

Whatsup: +44-20-3608-4181
Chemical Informatics-Open Access