Weka API: How to obtain a joint probability, e.g., Pr(A=x, B=y), from a BayesNet object?












0















I am using Weka Java API. I trained a Bayesnet on an Instances object (data set) with class (label) unspecified.



/**
* Initialization
*/
Instances data = ...;
BayesNet bn = new EditableBayesNet(data);
SearchAlgorithm learner = new TAN();
SimpleEstimator estimator = new SimpleEstimator();
/**
* Training
*/
bn.initStructure();
learner.buildStructure(bn, data);
estimator.estimateCPTs(bn);


Suppose the Instances object data has three attributes, A, B and C, and the dependency discovered is B->A, C->B.



The trained Bayesnet object bn is not for classification (I did not specify the class attribute for data), but I just want to calculate the joint probability of Pr(A=x, B=y). How do I get this probability from bn?



As far as I know, the distributionForInstance function of BayesNet may be the closest thing to use. It returns the probability distribution of a given instance (in our case, the instances is (A=x, B=y)). To use it, I could create a new Instance object testDataInstance and set value A=x and B=y, and call distributionForInstance with testDataInstance.



/**
* Obtain Pr(A="x", B="y")
*/
Instance testDataInstance = new SparseInstance(3);
Instances testDataSet = new Instances(
bn.m_Instances);
testDataSet.clear();
testDataInstance.setValue(testDataSet.attribute("A"), "x");
testDataInstance.setValue(testDataSet.attribute("B"), "y");
testDataSet.add(testDataInstance);
bn.distributionForInstance(testDataSet.firstInstance());


However, to my knowledge, the probability distribution indicates probabilities of all possible values for the class attribute in the bayesnet. As I did not specify a class attribute for data, it is unclear to me what the returned probability distribution means.










share|improve this question





























    0















    I am using Weka Java API. I trained a Bayesnet on an Instances object (data set) with class (label) unspecified.



    /**
    * Initialization
    */
    Instances data = ...;
    BayesNet bn = new EditableBayesNet(data);
    SearchAlgorithm learner = new TAN();
    SimpleEstimator estimator = new SimpleEstimator();
    /**
    * Training
    */
    bn.initStructure();
    learner.buildStructure(bn, data);
    estimator.estimateCPTs(bn);


    Suppose the Instances object data has three attributes, A, B and C, and the dependency discovered is B->A, C->B.



    The trained Bayesnet object bn is not for classification (I did not specify the class attribute for data), but I just want to calculate the joint probability of Pr(A=x, B=y). How do I get this probability from bn?



    As far as I know, the distributionForInstance function of BayesNet may be the closest thing to use. It returns the probability distribution of a given instance (in our case, the instances is (A=x, B=y)). To use it, I could create a new Instance object testDataInstance and set value A=x and B=y, and call distributionForInstance with testDataInstance.



    /**
    * Obtain Pr(A="x", B="y")
    */
    Instance testDataInstance = new SparseInstance(3);
    Instances testDataSet = new Instances(
    bn.m_Instances);
    testDataSet.clear();
    testDataInstance.setValue(testDataSet.attribute("A"), "x");
    testDataInstance.setValue(testDataSet.attribute("B"), "y");
    testDataSet.add(testDataInstance);
    bn.distributionForInstance(testDataSet.firstInstance());


    However, to my knowledge, the probability distribution indicates probabilities of all possible values for the class attribute in the bayesnet. As I did not specify a class attribute for data, it is unclear to me what the returned probability distribution means.










    share|improve this question



























      0












      0








      0


      1






      I am using Weka Java API. I trained a Bayesnet on an Instances object (data set) with class (label) unspecified.



      /**
      * Initialization
      */
      Instances data = ...;
      BayesNet bn = new EditableBayesNet(data);
      SearchAlgorithm learner = new TAN();
      SimpleEstimator estimator = new SimpleEstimator();
      /**
      * Training
      */
      bn.initStructure();
      learner.buildStructure(bn, data);
      estimator.estimateCPTs(bn);


      Suppose the Instances object data has three attributes, A, B and C, and the dependency discovered is B->A, C->B.



      The trained Bayesnet object bn is not for classification (I did not specify the class attribute for data), but I just want to calculate the joint probability of Pr(A=x, B=y). How do I get this probability from bn?



      As far as I know, the distributionForInstance function of BayesNet may be the closest thing to use. It returns the probability distribution of a given instance (in our case, the instances is (A=x, B=y)). To use it, I could create a new Instance object testDataInstance and set value A=x and B=y, and call distributionForInstance with testDataInstance.



      /**
      * Obtain Pr(A="x", B="y")
      */
      Instance testDataInstance = new SparseInstance(3);
      Instances testDataSet = new Instances(
      bn.m_Instances);
      testDataSet.clear();
      testDataInstance.setValue(testDataSet.attribute("A"), "x");
      testDataInstance.setValue(testDataSet.attribute("B"), "y");
      testDataSet.add(testDataInstance);
      bn.distributionForInstance(testDataSet.firstInstance());


      However, to my knowledge, the probability distribution indicates probabilities of all possible values for the class attribute in the bayesnet. As I did not specify a class attribute for data, it is unclear to me what the returned probability distribution means.










      share|improve this question
















      I am using Weka Java API. I trained a Bayesnet on an Instances object (data set) with class (label) unspecified.



      /**
      * Initialization
      */
      Instances data = ...;
      BayesNet bn = new EditableBayesNet(data);
      SearchAlgorithm learner = new TAN();
      SimpleEstimator estimator = new SimpleEstimator();
      /**
      * Training
      */
      bn.initStructure();
      learner.buildStructure(bn, data);
      estimator.estimateCPTs(bn);


      Suppose the Instances object data has three attributes, A, B and C, and the dependency discovered is B->A, C->B.



      The trained Bayesnet object bn is not for classification (I did not specify the class attribute for data), but I just want to calculate the joint probability of Pr(A=x, B=y). How do I get this probability from bn?



      As far as I know, the distributionForInstance function of BayesNet may be the closest thing to use. It returns the probability distribution of a given instance (in our case, the instances is (A=x, B=y)). To use it, I could create a new Instance object testDataInstance and set value A=x and B=y, and call distributionForInstance with testDataInstance.



      /**
      * Obtain Pr(A="x", B="y")
      */
      Instance testDataInstance = new SparseInstance(3);
      Instances testDataSet = new Instances(
      bn.m_Instances);
      testDataSet.clear();
      testDataInstance.setValue(testDataSet.attribute("A"), "x");
      testDataInstance.setValue(testDataSet.attribute("B"), "y");
      testDataSet.add(testDataInstance);
      bn.distributionForInstance(testDataSet.firstInstance());


      However, to my knowledge, the probability distribution indicates probabilities of all possible values for the class attribute in the bayesnet. As I did not specify a class attribute for data, it is unclear to me what the returned probability distribution means.







      java machine-learning weka bayesian bayesian-networks






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 27 '18 at 14:38







      Zhongjun 'Mark' Jin

















      asked Nov 20 '18 at 22:40









      Zhongjun 'Mark' JinZhongjun 'Mark' Jin

      8961229




      8961229
























          1 Answer
          1






          active

          oldest

          votes


















          1





          +50









          The javadoc page for distributionForInstance says that it calculates the class membership probabilities: http://weka.sourceforge.net/doc.dev/weka/classifiers/bayes/BayesNet.html#distributionForInstance-weka.core.Instance-



          So, that's not what you want probably. I think you can use the getDistribution(int nTargetNode) or getDistribution(java.lang.String sName) to achieve your answer.



          P(A=x, B=y) can be calculated as follows,



          P(A=x|B=y) = P(A=x, B=y)/P(B=y), which implies,

          P(A=x, B=y) = P(A=x|B=y)*P(B=y)


          Here is a pseudocode which illustrates my approach,



          double AP = bn.getDistribution("A"); // gives P(A|B) table
          double BP = bn.getDistribution("B"); // gives P(B|C) table
          double BPy = 0;

          // I am assuming x,y to be ints, but if they are not,
          // there should be some way of calculating BP[0][y] or AP[y][x]
          // BP[0][y] represents P(B=y) and AP[y][x] represents P(A=x|B=y)
          for(int i=0;i<BP.length;i++){
          BPy+=BP[0][y];
          }
          //BPy now contains probability of P(B=y)
          System.out.println(AP[y][x]*BPy)





          share|improve this answer
























          • Thanks @mettleap. This is exactly what I thought. We need conditional probability P(A=x|B=y) and marginal probability P(B=y) to get the joint probability P(A=x, B=y). I found BayesNet has a function "getMargin" that is supposed to return the marginal probability distribution of a given node, which seems to be an alternative way to get BPy. However, "getMargin" returns all zero for all nodes. Do you know why is that?

            – Zhongjun 'Mark' Jin
            Nov 27 '18 at 2:58






          • 1





            @Zhongjun'Mark'Jin, I think you have to use the estimateCPTs(BayesNet bayesNet) in the SimpleEstimator class first so that it fills the cpts, then maybe it will give the correct values ... also, there is an alpha parameter which you have to set for the SimpleEstimator which will decide the actual values in the CPTs

            – mettleap
            Nov 27 '18 at 5:39











          • Thanks. I did run estimateCPTs (forgot to add it in the post previously). I did not particularly specify alpha for the estimator and it was set to 0.5 by default.

            – Zhongjun 'Mark' Jin
            Nov 27 '18 at 7:11













          • Btw, I created a new thread at stackoverflow.com/questions/53494595/… for this question.

            – Zhongjun 'Mark' Jin
            Nov 27 '18 at 7:22











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402674%2fweka-api-how-to-obtain-a-joint-probability-e-g-pra-x-b-y-from-a-bayesnet%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1





          +50









          The javadoc page for distributionForInstance says that it calculates the class membership probabilities: http://weka.sourceforge.net/doc.dev/weka/classifiers/bayes/BayesNet.html#distributionForInstance-weka.core.Instance-



          So, that's not what you want probably. I think you can use the getDistribution(int nTargetNode) or getDistribution(java.lang.String sName) to achieve your answer.



          P(A=x, B=y) can be calculated as follows,



          P(A=x|B=y) = P(A=x, B=y)/P(B=y), which implies,

          P(A=x, B=y) = P(A=x|B=y)*P(B=y)


          Here is a pseudocode which illustrates my approach,



          double AP = bn.getDistribution("A"); // gives P(A|B) table
          double BP = bn.getDistribution("B"); // gives P(B|C) table
          double BPy = 0;

          // I am assuming x,y to be ints, but if they are not,
          // there should be some way of calculating BP[0][y] or AP[y][x]
          // BP[0][y] represents P(B=y) and AP[y][x] represents P(A=x|B=y)
          for(int i=0;i<BP.length;i++){
          BPy+=BP[0][y];
          }
          //BPy now contains probability of P(B=y)
          System.out.println(AP[y][x]*BPy)





          share|improve this answer
























          • Thanks @mettleap. This is exactly what I thought. We need conditional probability P(A=x|B=y) and marginal probability P(B=y) to get the joint probability P(A=x, B=y). I found BayesNet has a function "getMargin" that is supposed to return the marginal probability distribution of a given node, which seems to be an alternative way to get BPy. However, "getMargin" returns all zero for all nodes. Do you know why is that?

            – Zhongjun 'Mark' Jin
            Nov 27 '18 at 2:58






          • 1





            @Zhongjun'Mark'Jin, I think you have to use the estimateCPTs(BayesNet bayesNet) in the SimpleEstimator class first so that it fills the cpts, then maybe it will give the correct values ... also, there is an alpha parameter which you have to set for the SimpleEstimator which will decide the actual values in the CPTs

            – mettleap
            Nov 27 '18 at 5:39











          • Thanks. I did run estimateCPTs (forgot to add it in the post previously). I did not particularly specify alpha for the estimator and it was set to 0.5 by default.

            – Zhongjun 'Mark' Jin
            Nov 27 '18 at 7:11













          • Btw, I created a new thread at stackoverflow.com/questions/53494595/… for this question.

            – Zhongjun 'Mark' Jin
            Nov 27 '18 at 7:22
















          1





          +50









          The javadoc page for distributionForInstance says that it calculates the class membership probabilities: http://weka.sourceforge.net/doc.dev/weka/classifiers/bayes/BayesNet.html#distributionForInstance-weka.core.Instance-



          So, that's not what you want probably. I think you can use the getDistribution(int nTargetNode) or getDistribution(java.lang.String sName) to achieve your answer.



          P(A=x, B=y) can be calculated as follows,



          P(A=x|B=y) = P(A=x, B=y)/P(B=y), which implies,

          P(A=x, B=y) = P(A=x|B=y)*P(B=y)


          Here is a pseudocode which illustrates my approach,



          double AP = bn.getDistribution("A"); // gives P(A|B) table
          double BP = bn.getDistribution("B"); // gives P(B|C) table
          double BPy = 0;

          // I am assuming x,y to be ints, but if they are not,
          // there should be some way of calculating BP[0][y] or AP[y][x]
          // BP[0][y] represents P(B=y) and AP[y][x] represents P(A=x|B=y)
          for(int i=0;i<BP.length;i++){
          BPy+=BP[0][y];
          }
          //BPy now contains probability of P(B=y)
          System.out.println(AP[y][x]*BPy)





          share|improve this answer
























          • Thanks @mettleap. This is exactly what I thought. We need conditional probability P(A=x|B=y) and marginal probability P(B=y) to get the joint probability P(A=x, B=y). I found BayesNet has a function "getMargin" that is supposed to return the marginal probability distribution of a given node, which seems to be an alternative way to get BPy. However, "getMargin" returns all zero for all nodes. Do you know why is that?

            – Zhongjun 'Mark' Jin
            Nov 27 '18 at 2:58






          • 1





            @Zhongjun'Mark'Jin, I think you have to use the estimateCPTs(BayesNet bayesNet) in the SimpleEstimator class first so that it fills the cpts, then maybe it will give the correct values ... also, there is an alpha parameter which you have to set for the SimpleEstimator which will decide the actual values in the CPTs

            – mettleap
            Nov 27 '18 at 5:39











          • Thanks. I did run estimateCPTs (forgot to add it in the post previously). I did not particularly specify alpha for the estimator and it was set to 0.5 by default.

            – Zhongjun 'Mark' Jin
            Nov 27 '18 at 7:11













          • Btw, I created a new thread at stackoverflow.com/questions/53494595/… for this question.

            – Zhongjun 'Mark' Jin
            Nov 27 '18 at 7:22














          1





          +50







          1





          +50



          1




          +50





          The javadoc page for distributionForInstance says that it calculates the class membership probabilities: http://weka.sourceforge.net/doc.dev/weka/classifiers/bayes/BayesNet.html#distributionForInstance-weka.core.Instance-



          So, that's not what you want probably. I think you can use the getDistribution(int nTargetNode) or getDistribution(java.lang.String sName) to achieve your answer.



          P(A=x, B=y) can be calculated as follows,



          P(A=x|B=y) = P(A=x, B=y)/P(B=y), which implies,

          P(A=x, B=y) = P(A=x|B=y)*P(B=y)


          Here is a pseudocode which illustrates my approach,



          double AP = bn.getDistribution("A"); // gives P(A|B) table
          double BP = bn.getDistribution("B"); // gives P(B|C) table
          double BPy = 0;

          // I am assuming x,y to be ints, but if they are not,
          // there should be some way of calculating BP[0][y] or AP[y][x]
          // BP[0][y] represents P(B=y) and AP[y][x] represents P(A=x|B=y)
          for(int i=0;i<BP.length;i++){
          BPy+=BP[0][y];
          }
          //BPy now contains probability of P(B=y)
          System.out.println(AP[y][x]*BPy)





          share|improve this answer













          The javadoc page for distributionForInstance says that it calculates the class membership probabilities: http://weka.sourceforge.net/doc.dev/weka/classifiers/bayes/BayesNet.html#distributionForInstance-weka.core.Instance-



          So, that's not what you want probably. I think you can use the getDistribution(int nTargetNode) or getDistribution(java.lang.String sName) to achieve your answer.



          P(A=x, B=y) can be calculated as follows,



          P(A=x|B=y) = P(A=x, B=y)/P(B=y), which implies,

          P(A=x, B=y) = P(A=x|B=y)*P(B=y)


          Here is a pseudocode which illustrates my approach,



          double AP = bn.getDistribution("A"); // gives P(A|B) table
          double BP = bn.getDistribution("B"); // gives P(B|C) table
          double BPy = 0;

          // I am assuming x,y to be ints, but if they are not,
          // there should be some way of calculating BP[0][y] or AP[y][x]
          // BP[0][y] represents P(B=y) and AP[y][x] represents P(A=x|B=y)
          for(int i=0;i<BP.length;i++){
          BPy+=BP[0][y];
          }
          //BPy now contains probability of P(B=y)
          System.out.println(AP[y][x]*BPy)






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 26 '18 at 18:00









          mettleapmettleap

          1,080216




          1,080216













          • Thanks @mettleap. This is exactly what I thought. We need conditional probability P(A=x|B=y) and marginal probability P(B=y) to get the joint probability P(A=x, B=y). I found BayesNet has a function "getMargin" that is supposed to return the marginal probability distribution of a given node, which seems to be an alternative way to get BPy. However, "getMargin" returns all zero for all nodes. Do you know why is that?

            – Zhongjun 'Mark' Jin
            Nov 27 '18 at 2:58






          • 1





            @Zhongjun'Mark'Jin, I think you have to use the estimateCPTs(BayesNet bayesNet) in the SimpleEstimator class first so that it fills the cpts, then maybe it will give the correct values ... also, there is an alpha parameter which you have to set for the SimpleEstimator which will decide the actual values in the CPTs

            – mettleap
            Nov 27 '18 at 5:39











          • Thanks. I did run estimateCPTs (forgot to add it in the post previously). I did not particularly specify alpha for the estimator and it was set to 0.5 by default.

            – Zhongjun 'Mark' Jin
            Nov 27 '18 at 7:11













          • Btw, I created a new thread at stackoverflow.com/questions/53494595/… for this question.

            – Zhongjun 'Mark' Jin
            Nov 27 '18 at 7:22



















          • Thanks @mettleap. This is exactly what I thought. We need conditional probability P(A=x|B=y) and marginal probability P(B=y) to get the joint probability P(A=x, B=y). I found BayesNet has a function "getMargin" that is supposed to return the marginal probability distribution of a given node, which seems to be an alternative way to get BPy. However, "getMargin" returns all zero for all nodes. Do you know why is that?

            – Zhongjun 'Mark' Jin
            Nov 27 '18 at 2:58






          • 1





            @Zhongjun'Mark'Jin, I think you have to use the estimateCPTs(BayesNet bayesNet) in the SimpleEstimator class first so that it fills the cpts, then maybe it will give the correct values ... also, there is an alpha parameter which you have to set for the SimpleEstimator which will decide the actual values in the CPTs

            – mettleap
            Nov 27 '18 at 5:39











          • Thanks. I did run estimateCPTs (forgot to add it in the post previously). I did not particularly specify alpha for the estimator and it was set to 0.5 by default.

            – Zhongjun 'Mark' Jin
            Nov 27 '18 at 7:11













          • Btw, I created a new thread at stackoverflow.com/questions/53494595/… for this question.

            – Zhongjun 'Mark' Jin
            Nov 27 '18 at 7:22

















          Thanks @mettleap. This is exactly what I thought. We need conditional probability P(A=x|B=y) and marginal probability P(B=y) to get the joint probability P(A=x, B=y). I found BayesNet has a function "getMargin" that is supposed to return the marginal probability distribution of a given node, which seems to be an alternative way to get BPy. However, "getMargin" returns all zero for all nodes. Do you know why is that?

          – Zhongjun 'Mark' Jin
          Nov 27 '18 at 2:58





          Thanks @mettleap. This is exactly what I thought. We need conditional probability P(A=x|B=y) and marginal probability P(B=y) to get the joint probability P(A=x, B=y). I found BayesNet has a function "getMargin" that is supposed to return the marginal probability distribution of a given node, which seems to be an alternative way to get BPy. However, "getMargin" returns all zero for all nodes. Do you know why is that?

          – Zhongjun 'Mark' Jin
          Nov 27 '18 at 2:58




          1




          1





          @Zhongjun'Mark'Jin, I think you have to use the estimateCPTs(BayesNet bayesNet) in the SimpleEstimator class first so that it fills the cpts, then maybe it will give the correct values ... also, there is an alpha parameter which you have to set for the SimpleEstimator which will decide the actual values in the CPTs

          – mettleap
          Nov 27 '18 at 5:39





          @Zhongjun'Mark'Jin, I think you have to use the estimateCPTs(BayesNet bayesNet) in the SimpleEstimator class first so that it fills the cpts, then maybe it will give the correct values ... also, there is an alpha parameter which you have to set for the SimpleEstimator which will decide the actual values in the CPTs

          – mettleap
          Nov 27 '18 at 5:39













          Thanks. I did run estimateCPTs (forgot to add it in the post previously). I did not particularly specify alpha for the estimator and it was set to 0.5 by default.

          – Zhongjun 'Mark' Jin
          Nov 27 '18 at 7:11







          Thanks. I did run estimateCPTs (forgot to add it in the post previously). I did not particularly specify alpha for the estimator and it was set to 0.5 by default.

          – Zhongjun 'Mark' Jin
          Nov 27 '18 at 7:11















          Btw, I created a new thread at stackoverflow.com/questions/53494595/… for this question.

          – Zhongjun 'Mark' Jin
          Nov 27 '18 at 7:22





          Btw, I created a new thread at stackoverflow.com/questions/53494595/… for this question.

          – Zhongjun 'Mark' Jin
          Nov 27 '18 at 7:22


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53402674%2fweka-api-how-to-obtain-a-joint-probability-e-g-pra-x-b-y-from-a-bayesnet%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

          Alcedinidae

          RAC Tourist Trophy