Geeks With Blogs
James Oloo Onyango - Programming Insights Mastery of this pervasive field of programming demands a life time's effort. It cannot be bought at a lesser price!
The .NET community, unlike the Java and python community do not provide Data Mining solutions outside the framework of their RDBMS MS SQL Server. In my quest to integrate Data mining into an existing ASP.NET enterprise solution running on MySQL, i was faced with quite the task!
In attempting to review existing solutions, i followed the following criteria:
  1. Open-source and free (not necessarily the same thing!)
  2. GUI for interactive data mining and exploration
  3. Concise, accessible and well documented API to use for embedded data mining
  4. Wide coverage of Data Mining algorithms
  5. Vibrant user community
Followign these, i zeroed down to WEKA and Orange
WEKA is a Java based collection of machine learning algorithms for solving data mining problems. It was initially developed in the Univesity of Waikato and is now maintaned by Pentaho.
Orange is a python based data visualization and analysis suite for data mining
Deciding to use WEKA, my first hurdle was interoperability. Alot has been done by both the .NET and Java communitie in ensuring fluid interop at different levels. After careful consideration, i decided to port the code entirely to .NET. Below is a quick highlight of my take on the different options.
Web Services: The obvious first choice, though i would have to babysit 2 servers on the same machine or otherwise, let alone the performance drawbacks.
Runtime Bridging (in-process) This would involve hosting both the JVM and the CLR in the same process. Tools for this purpose are mostly expensive (JNBridge, JIntegra, ..) and the open source solutions are C++ dependent and/or unsupported.
Porting  Code: Though initially scoffed at, i found it to be ideal for my purposes! I used  IKVM.
IKVM is an implementation of Java for Mono and .NET including a JVM, and the Java class libraries in .NET.
Steps
  • After downloading the weka source, i exported a single weka.jar file in Eclipse.
  • I downloaded the latest ikvm implementation and run the following command >ikvm.exe target:library weka.jar
  • This built a weka.dll .NET library
  • I referenced this a .NET project together with the JVM.DLL provided by ikvm and voila!
Sample source code to show a complete learning process. 
 
using System;
using System.IO;
using weka.classifiers;
using weka.classifiers.functions;
using weka.core;
using java.io;
using Attribute = weka.core.Attribute;
using System.Runtime.Serialization.Formatters.Binary;
using Console = System.Console;
using File = System.IO.File;
 
namespace TestApp
{
    internal class Sample
    {
        private void ExecuteWekaTutorial()
        {
            FastVector allAttributes = createAttributes();
            Instances learningDataset = CreateLearningDataSet(allAttributes);
            Classifier predictiveModel = LearnPredictiveModel(learningDataset);
            Evaluation evaluation = evaluatePredictiveModel(predictiveModel, learningDataset);
            Console.WriteLine(evaluation.toSummaryString());
            PredictUnknownCases(learningDataset, predictiveModel);
        }
 
        private FastVector createAttributes()
        {
            var ageAttribute = new Attribute("age");
            var genderAttributeValues = new FastVector(2);
            genderAttributeValues.addElement("male");
            genderAttributeValues.addElement("female");
            var genderAttribute = new Attribute("gender",
                                                genderAttributeValues);
            var numLoginsAttribute = new Attribute("numLogins");
            var allAttributes = new FastVector(3);
            allAttributes.addElement(ageAttribute);
            allAttributes.addElement(genderAttribute);
            allAttributes.addElement(numLoginsAttribute);
            return allAttributes;
        }
 
        private Instances CreateLearningDataSet(FastVector allAttributes)
        {
            var trainingDataSet = new Instances("wekaTutorial", allAttributes, 4);
            trainingDataSet.setClassIndex(2);
            AddInstance(trainingDataSet, 20.0, "male", 5);
            AddInstance(trainingDataSet, 30.0, "female", 2);
            AddInstance(trainingDataSet, 40.0, "male", 3);
            AddInstance(trainingDataSet, 35.0, "female", 4);
            return trainingDataSet;
        }
 
        private void AddInstance(Instances trainingDataSet,
                                 double age, String gender, int numLogins)
        {
            Instance instance = createInstance(trainingDataSet, age,
                                               gender, numLogins);
            trainingDataSet.add(instance);
        }
 
        private Instance createInstance(Instances associatedDataSet,
                                        double age, String gender, int numLogins)
        {
            var instance = new Instance(3);
            instance.setDataset(associatedDataSet);
            instance.setValue(0, age);
            instance.setValue(1, gender);
            instance.setValue(2, numLogins);
            return instance;
        }
 
        private Classifier LearnPredictiveModel(Instances learningDataset)
        {
            Classifier classifier = getClassifier();
            classifier.buildClassifier(learningDataset);
            return classifier;
        }
 
        private Classifier getClassifier()
        {
            var learner = new MyRBFNetwork();
            learner.setNumClusters(2);
 
            return learner;
        }
 
        private Evaluation evaluatePredictiveModel(Classifier classifier,
                                                   Instances learningDataset)
        {
            var learningSetEvaluation =
                new Evaluation(learningDataset);
            learningSetEvaluation.evaluateModel(classifier,
                                                learningDataset);
            return learningSetEvaluation;
        }
 
        private void PredictUnknownCases(Instances learningDataset,
                                         Classifier predictiveModel)
        {
            Instance testMaleInstance =
                createInstance(learningDataset, 45.0, "male", 0);
            Instance testFemaleInstance =
                createInstance(learningDataset, 45.0, "female", 0);
            double malePrediction =
                predictiveModel.classifyInstance(testMaleInstance);
            double femalePrediction =
                predictiveModel.classifyInstance(testFemaleInstance);
            Console.WriteLine("Predicted number of logins [age=45]: ");
            Console.WriteLine("\tMale = " + malePrediction);
            Console.WriteLine("\tFemale = " + femalePrediction);
        }
 
 
        private static void Main(string[] args)
        {
            var program = new Sample();
            program.ExecuteWekaTutorial();
        }
    }
}

IKVM is however not without fault, the GUI libraries are still below par but it should suffice for all the learning algorithms in embedded data mining. Part 2 will cover saving the models for future reference ( high performance!)

Posted on Tuesday, March 2, 2010 2:23 PM .NET | Back to top


Comments on this post: Embedded Data Mining .NET

No comments posted yet.
Your comment:
 (will show your gravatar)


Copyright © James Oloo Onyango | Powered by: GeeksWithBlogs.net