The issues before the data mining are often complex, correspondingly also the added value is expected responses, they require an experienced eye. The tools also require a great deal of knowledge and a little courage to take ownership of their documentation. The opposite would only take a miracle: imagine how effortlessly detect hidden information, usually embedded in noise, which defy instinct and the accuracy of which is revealed only globally?
However, data mining provides a misleading appearance. Indeed, whatever the method, whoever manipulates data always produces a result. This is encouraging and it seems easy. But beyond a wide range of reasonable responses are an even greater number of responses fanciful (mismatch between the data, objectives, methodology and the various parameters). These mirages are common; they are simply to show things that do not exist instead of things that are hidden. It is therefore necessary to prune those wrong answers to choose run its course, among the reasonable results, those who best meet the objectives.
If we retain only one recommendation that it would never stop at the first result. Must be calculated, recalculate, compare, interpret, refine…
There is another common sense advice to avoid mirages: be based on robust results. This is not a spin, it’s a real board. A result is valid, so robust; it is applicable to all data sets allowed by the class of problems we wish to solve. In other words if you work on 100 and the data obtained with a method results, then it is verified that applying the same method on 100 other data collected by the same process produces a different result reasonably close to the previous .
In fact it often makes a slightly different way: it has 100 data from start to finish, but you torture data long 60 only. When we approach a pleasant solution is applied without considering the 40 other data, or all of the 100. We can then obtain a first idea of the robustness of the solution.