|
Learning about KnowledgeMiner the Easy Way
By Bert Altenburg
KnowledgeMiner is probably unlike any program you have ever
worked with before. You do know what to do with, say, a word
processor. But what do you do with a data mining program?
We will show you just that. If you're a techie, you can find
lots of in-depth information in the help menu of KnowledgeMiner
Tutorial in the Help menu. However, if you're a regular
person, you may want to learn about it the easy way. And that
is what this page is about.
It is a simple deal. You provide some raw data, we'll show
you how to get the most out of it. Using your data, KnowledgeMiner
has the remarkable capability to make a reliable model allowing
predictions. And here is how.
Part One: Getting started
Before KnowledgeMiner can be of use to you, it has to learn.
However, there is no need for you to teach KnowledgeMiner anything.
Using the data, it will teach itself, deducing relationships
a mathematician would have a hard time finding out. What you
need is source data and target data.
Using this, KnowledgeMiner will find hidden relationships
all by itself, allowing it to make predictions using new source
data you provide.
KnowledgeMiner can be used in a wide scope of fields. Some
examples are given here, but actually the scope is so wide,
that your area of interest is more than likely not covered
by these examples.
- If you're interested in stocks, KnowledgeMiner can help
you to predict the rise and fall of shares.
- If you're interested in medicine, KnowledgeMiner may help
you to predict the life-expectancy of cancer patients.
- If you are interested in chemistry, KnowledgeMiner may
help you to predict the properties of new molecules.
Actually, just about any field that involves numbers and
unknown relationships is where KnowledgeMiner can be applied.
An example of how to work with KnowledgeMiner is given below.
We really hope it is outside your area of interest and expertise.
That would be the best demonstration that it is not necessary
to have expertise in the field to make valid predictions.
From the example, you will appreciate how to apply what you've
learned to your own data.
Okay, let's get on with it. Remember, you do not have to
know anything about the application area, here physics and
chemistry. Below you see a table. Each row contains information
on a certain hydrocarbon (gasoline is a mixture containing
a lot of such molecules).
The objective is to predict the boiling point of a hydrocarbon
using nothing more that two series of data: 1) the
number of carbon atoms of a hydrocarbon; and 2) the
molecular weight of that hydrocarbon. To teach KnowledgeMiner,
a third set of data is necessary: a series of boiling points
for these hydrocarbons. Thus, the columns with the
number of carbon atoms and the molecular weights represent
the source data; the column with boiling points the
target data.
|
Predicting the boiling point of hydrocarbons
|
|
|
name of hydrocarbon
|
No. of carbon atoms
|
Boiling point
|
Molecular weight
|
|
Actual boiling points
|
|
Methane
|
1.0000000000
|
-164.0000000
|
16.039999999
|
|
|
|
Ethane
|
2.0000000000
|
-88.60000000
|
30.070000000
|
|
|
|
Butane
|
4.0000000000
|
-0.500000000
|
58.119999999
|
|
|
|
Hexane
|
6.0000000000
|
69.000000000
|
86.180000000
|
|
|
|
Heptane
|
7.0000000000
|
98.400000000
|
100.20999999
|
|
|
|
Nonane
|
9.0000000000
|
150.80000000
|
128.25999999
|
|
|
|
Decane
|
10.000000000
|
174.10000000
|
142.28999999
|
|
|
|
Dodecane
|
12.000000000
|
216.30000000
|
170.34000000
|
|
|
|
Octane
|
8.0000000000
|
|
114.23000000
|
|
125.70000000
|
|
Pentane
|
5.0000000000
|
|
72.150000000
|
|
36.100000000
|
|
Propane
|
3.0000000000
|
|
44.110000000
|
|
-42.10000000
|
|
In general terms: for any project you
will need two (or more) columns of source data and one column
with target data.
Without quitting Netscape Navigator (you probably don't want
to miss this page!), open KnowledgeMiner. Click the close
box of the window that opens automatically and open the folder
Chemistry/Ecology (it is in the Examples folder). Open the
file BoilingPoint. There you see all the data from the table
shown above.
Now take the following steps:
1. You have to tell KnowledgeMiner
which is the column with the target data. To that end, click
the cell labeled 'Boiling point'. Please note that
in the top row (listing variables as X1, X2, X3 ...) the heading
of this column now reads Y. That is the indication
that this column contains the target data used for learning.
Your screen should look as depicted below, although the highlight
color used to select the cell may differ from the one shown
here. (You can change the highlight color of the selection
from the control panel Appearance).
2. You have to tell KnowledgeMiner
which columns contain the source data. That is, which columns
you want KnowledgeMiner to use to build the model with. To
that end, press the Command key (next to the spacebar, the
one with the clover leaf and/or Apple symbol on it) and click
the cells labeled 'No. of carbon atoms' and 'Molecular
weight'. As the minimum number of columns you have to
select is 2, it may seem to be a bit silly in this particular
case, but do it anyway. Now your screen should look like this.
3. You have to tell KnowledgeMiner
to start building the model. Choose Create Input-Output-Model
from the Modeling menu. Now you will see an message
box named: Input-Output-Model: Settings.
On the left you can verify that the Output variable is the
boiling point, and that the Input variables are the number
of carbon atoms and the molecular weight, respectively.
We are going to use 8 data sets (number of rows) as source
data. In the box Input data, enter 8 as the data length.
(Note: As we are not trying to do time-dependant models, the
maximum time lag is 0. We will get into time-dependant
modeling later).
You can also choose for a linear model or a nonlinear model.
Do not worry about making a wrong choice. If you have no idea,
just choose nonlinear permissible. If a linear model
were to give the best results and you unknowingly opt for
nonlinear models, there is really no problem. In that case,
KnowledgeMiner will end up with the best linear model anyway.
So it is best to choose nonlinear permissible, although it
may take your Mac a little longer to deduce the optimum model.
Finally, to build the model, click Modeling (or hit
the return key). KnowledgeMiner starts building the model,
and as soon as it is ready, it presents you a graph showing
both the original (target) data and the approximated data
using the model. Thus, the more the red
and blue lines overlap, the better
the model.
If you experiment between using the linear and nonlinear
model, you will notice that the nonlinear model gives better
results here.
Part Two: Making predictions.
There are two ways to make predictions. (You must have followed
the three steps described in Part One.)
I) Using a spreadsheet program.
The first way is to use a spreadsheet program. Unlike a neural
network program, KnowledgeMiner tells you what the relation
is, it discovered. Just choose Model Equation from
the Window menu. You can use the equation displayed
there to enter it into a spreadsheet program. (Note: In a
formula, e is short for power of ten. For example.
1e3 equals 1000. If your formula reads: + 5.74e+1X1, it means
57.4 * X1).
II) KnowledgeMiner
There is no need for a spreadsheet program, however. KnowledgeMiner
can do it for you too. Here's how.
Choose What-If-Prediction from the Modeling
menu. You are presented with a message box named: What-If-Prediction.
Enter the number of rows for which you want your prediction.
Here there are three rows for which you want to predict the
boiling points. So enter 3 as the Forecast Horizon.
You probably want the data to be put into the respective cells,
so we check the second check box.
The calculated datapoints are depicted in the graph window.
If you put the data window in front, you will see the predicted
data in red. For comparison, the actual data have been shown
in the column to the right.
If you want the actual data in the graph window as well,
you will have to put them below the data to be predicted.
Before choosing What-If-Prediction, you will choose
Original Data Begins In This Row from the Table
menu by clicking the first row as shown below.
You continue with What-If-Prediction as described
above, and you will get to see both the predicted and the
data for comparison in a single graph.
-----------------------------
Also by Bert Altenburg is published a book (PDF) about AppleScript for Starters. You can download it here free (896k).
|