Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# stochtree 0.1.1

* Fixed initialization bug in several R package code examples for random effects models

# stochtree 0.1.0

* Initial release on CRAN.
Expand Down
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,13 @@ pip install matplotlib seaborn jupyterlab

# R Package

The package can be installed in R via
The R package can be installed from CRAN via

```
install.packages("stochtree")
```

The development version of `stochtree` can be installed from Github via

```
remotes::install_github("StochasticTree/stochtree", ref="r-dev")
Expand Down
2 changes: 1 addition & 1 deletion demo/notebooks/causal_inference.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Causal Inference Demo Notebook"
"# Causal Inference"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion demo/notebooks/causal_inference_feature_subsets.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Causal Inference with Feature Subsets Demo Notebook\n",
"# Causal Inference with Feature Subsets\n",
"\n",
"This is a duplicate of the main causal inference demo which shows how a user might decide to use only a subset of covariates in the treatment effect forest. \n",
"Why might we want to do that? Well, in many cases it is plausible that some covariates (for example age, income, etc...) influence the outcome of interest \n",
Expand Down
18 changes: 7 additions & 11 deletions demo/notebooks/heteroskedastic_supervised_learning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Supervised Learning with Heteroskedasticity Demo Notebook"
"# Heteroskedastic Supervised Learning"
]
},
{
Expand Down Expand Up @@ -118,13 +118,6 @@
"s_x_test = s_x[test_inds]\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Demo 1: Using `W` in a linear leaf regression"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -139,9 +132,12 @@
"outputs": [],
"source": [
"bart_model = BARTModel()\n",
"bart_params = {'num_trees_mean': 100, 'num_trees_variance': 50, 'sample_sigma_global': True, 'sample_sigma_leaf': False}\n",
"global_params = {'sample_sigma2_global': True}\n",
"mean_params = {'num_trees': 100, 'sample_sigma2_leaf': False}\n",
"variance_params = {'num_trees': 50}\n",
"bart_model.sample(X_train=X_train, y_train=y_train, X_test=X_test, basis_train=basis_train, basis_test=basis_test,\n",
" num_gfr=10, num_mcmc=100, params=bart_params)"
" num_gfr=10, num_mcmc=100, general_params=global_params, mean_forest_params=mean_params, \n",
" variance_forest_params=variance_params)"
]
},
{
Expand Down Expand Up @@ -171,7 +167,7 @@
"metadata": {},
"outputs": [],
"source": [
"forest_preds_s_x_mcmc = bart_model.sigma_x_test\n",
"forest_preds_s_x_mcmc = np.sqrt(bart_model.sigma2_x_test)\n",
"s_x_avg_mcmc = np.squeeze(forest_preds_s_x_mcmc).mean(axis = 1, keepdims = True)\n",
"s_x_df_mcmc = pd.DataFrame(np.concatenate((np.expand_dims(s_x_test,1), s_x_avg_mcmc), axis = 1), columns=[\"True standard deviation\", \"Average estimated standard deviation\"])\n",
"sns.scatterplot(data=s_x_df_mcmc, x=\"Average estimated standard deviation\", y=\"True standard deviation\")\n",
Expand Down
4 changes: 2 additions & 2 deletions demo/notebooks/multivariate_treatment_causal_inference.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Causal Inference with Multivariate Treatments Demo Notebook"
"# Multivariate Treatment Causal Inference"
]
},
{
Expand Down Expand Up @@ -45,7 +45,7 @@
"rng = np.random.default_rng()\n",
"\n",
"# Generate covariates and basis\n",
"n = 5000\n",
"n = 500\n",
"p_X = 5\n",
"X = rng.uniform(0, 1, (n, p_X))\n",
"pi_X = np.c_[0.25 + 0.5*X[:,0], 0.75 - 0.5*X[:,1]]\n",
Expand Down
28 changes: 12 additions & 16 deletions demo/notebooks/prototype_interface.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Demo of the `StochTree` Prototype Interface"
"# Low-Level Interface"
]
},
{
Expand Down Expand Up @@ -106,7 +106,7 @@
"rng = np.random.default_rng(random_seed)\n",
"\n",
"# Generate covariates and basis\n",
"n = 1000\n",
"n = 500\n",
"p_X = 10\n",
"p_W = 1\n",
"X = rng.uniform(0, 1, (n, p_X))\n",
Expand Down Expand Up @@ -383,14 +383,14 @@
"rng = np.random.default_rng(random_seed)\n",
"\n",
"# Generate covariates and basis\n",
"n = 1000\n",
"n = 500\n",
"p_X = 5\n",
"X = rng.uniform(0, 1, (n, p_X))\n",
"pi_X = 0.25 + 0.5*X[:,0]\n",
"pi_X = 0.35 + 0.3*X[:,0]\n",
"Z = rng.binomial(1, pi_X, n).astype(float)\n",
"\n",
"# Define the outcome mean functions (prognostic and treatment effects)\n",
"mu_X = pi_X*5\n",
"mu_X = (pi_X - 0.5)*30\n",
"# tau_X = np.sin(X[:,1]*2*np.pi)\n",
"tau_X = X[:,1]*2\n",
"\n",
Expand Down Expand Up @@ -423,24 +423,24 @@
"min_samples_leaf_mu = 1\n",
"num_trees_mu = 200\n",
"cutpoint_grid_size_mu = 100\n",
"tau_init_mu = 1/200\n",
"tau_init_mu = 1/num_trees_mu\n",
"leaf_prior_scale_mu = np.array([[tau_init_mu]], order='C')\n",
"a_leaf_mu = 3.\n",
"b_leaf_mu = 1/200\n",
"b_leaf_mu = 1/num_trees_mu\n",
"leaf_regression_mu = False\n",
"feature_types_mu = np.repeat(0, p_X).astype(int) # 0 = numeric\n",
"var_weights_mu = np.repeat(1/(p_X + 1), p_X + 1)\n",
"\n",
"# Treatment forest parameters\n",
"alpha_tau = 0.25\n",
"alpha_tau = 0.75\n",
"beta_tau = 3.\n",
"min_samples_leaf_tau = 1\n",
"num_trees_tau = 50\n",
"cutpoint_grid_size_tau = 100\n",
"tau_init_tau = 1/50\n",
"tau_init_tau = 1/num_trees_tau\n",
"leaf_prior_scale_tau = np.array([[tau_init_tau]], order='C')\n",
"a_leaf_tau = 3.\n",
"b_leaf_tau = 1/50\n",
"b_leaf_tau = 1/num_trees_tau\n",
"leaf_regression_tau = True\n",
"feature_types_tau = np.repeat(0, p_X).astype(int) # 0 = numeric\n",
"var_weights_tau = np.repeat(1/p_X, p_X)\n",
Expand All @@ -466,7 +466,7 @@
"source": [
"# Prognostic Forest Dataset (covariates)\n",
"dataset_mu = Dataset()\n",
"dataset_mu.add_covariates(np.c_[X,pi_X])\n",
"dataset_mu.add_covariates(np.c_[X, pi_X])\n",
"\n",
"# Treatment Forest Dataset (covariates and treatment variable)\n",
"dataset_tau = Dataset()\n",
Expand Down Expand Up @@ -521,7 +521,7 @@
"outputs": [],
"source": [
"num_warmstart = 10\n",
"num_mcmc = 500\n",
"num_mcmc = 100\n",
"num_samples = num_warmstart + num_mcmc\n",
"global_var_samples = np.concatenate((np.array([global_variance_init]), np.repeat(0, num_samples)))\n",
"leaf_scale_samples_mu = np.concatenate((np.array([tau_init_mu]), np.repeat(0, num_samples)))\n",
Expand Down Expand Up @@ -562,8 +562,6 @@
" forest_sampler_tau.sample_one_iteration(forest_container_tau, active_forest_tau, dataset_tau, residual, cpp_rng, \n",
" feature_types_tau, cutpoint_grid_size_tau, leaf_prior_scale_tau, var_weights_tau, \n",
" 0.0, 0.0, global_var_samples[i], 1, True, True, False)\n",
" # leaf_scale_samples_tau[i+1] = leaf_var_model_tau.sample_one_iteration(forest_container_tau, cpp_rng, a_leaf_tau, b_leaf_tau)\n",
" # leaf_prior_scale_tau[0,0] = leaf_scale_samples_tau[i+1]\n",
" tau_x = np.squeeze(active_forest_tau.predict_raw(dataset_tau))\n",
" s_tt0 = np.sum(tau_x*tau_x*(Z==0))\n",
" s_tt1 = np.sum(tau_x*tau_x*(Z==1))\n",
Expand Down Expand Up @@ -606,8 +604,6 @@
" forest_sampler_tau.sample_one_iteration(forest_container_tau, active_forest_tau, dataset_tau, residual, cpp_rng, \n",
" feature_types_tau, cutpoint_grid_size_tau, leaf_prior_scale_tau, var_weights_tau, \n",
" 0.0, 0.0, global_var_samples[i], 1, True, False, False)\n",
" # leaf_scale_samples_tau[i+1] = leaf_var_model_tau.sample_one_iteration(forest_container_tau, cpp_rng, a_leaf_tau, b_leaf_tau, i)\n",
" # leaf_prior_scale_tau[0,0] = leaf_scale_samples_tau[i+1]\n",
" tau_x = np.squeeze(active_forest_tau.predict_raw(dataset_tau))\n",
" s_tt0 = np.sum(tau_x*tau_x*(Z==0))\n",
" s_tt1 = np.sum(tau_x*tau_x*(Z==1))\n",
Expand Down
23 changes: 20 additions & 3 deletions demo/notebooks/serialization.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Serialization Demo Notebook"
"# Model Serialization"
]
},
{
Expand All @@ -29,6 +29,7 @@
"source": [
"import json\n",
"import numpy as np\n",
"import os\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
Expand Down Expand Up @@ -120,7 +121,7 @@
"outputs": [],
"source": [
"bart_model = BARTModel()\n",
"bart_model.sample(X_train=X_train, y_train=y_train, basis_train=basis_train, X_test=X_test, basis_test=basis_test, num_gfr=10, num_mcmc=100)"
"bart_model.sample(X_train=X_train, y_train=y_train, basis_train=basis_train, X_test=X_test, basis_test=basis_test, num_gfr=10, num_mcmc=10)"
]
},
{
Expand Down Expand Up @@ -150,7 +151,7 @@
"metadata": {},
"outputs": [],
"source": [
"sigma_df_mcmc = pd.DataFrame(np.concatenate((np.expand_dims(np.arange(bart_model.num_samples - bart_model.num_gfr),axis=1), np.expand_dims(bart_model.global_var_samples,axis=1)), axis = 1), columns=[\"Sample\", \"Sigma\"])\n",
"sigma_df_mcmc = pd.DataFrame(np.concatenate((np.expand_dims(np.arange(bart_model.num_samples),axis=1), np.expand_dims(bart_model.global_var_samples,axis=1)), axis = 1), columns=[\"Sample\", \"Sigma\"])\n",
"sns.scatterplot(data=sigma_df_mcmc, x=\"Sample\", y=\"Sigma\")\n",
"plt.show()"
]
Expand Down Expand Up @@ -321,6 +322,22 @@
"plt.axline((0, 0), slope=1, color=\"black\", linestyle=(0, (3,3)))\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Clean up JSON file"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"os.remove('bart.json')"
]
}
],
"metadata": {
Expand Down
2 changes: 1 addition & 1 deletion demo/notebooks/supervised_learning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Supervised Learning Demo Notebook"
"# Supervised Learning"
]
},
{
Expand Down
168 changes: 29 additions & 139 deletions demo/notebooks/tree_inspection.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion include/stochtree/leaf_model.h
Original file line number Diff line number Diff line change
Expand Up @@ -239,7 +239,7 @@ namespace StochTree {
* \beta \sim N\left(0, \tau\right)
* \f]
*
* Allowing for case / variance weights $w_i$ as above, we derive a reduced log marginal likelihood of
* Allowing for case / variance weights \f$w_i\f$ as above, we derive a reduced log marginal likelihood of
*
* \f[
* L(y) \propto \frac{1}{2} \log\left(\frac{\sigma^2}{s_{wxx,\ell} \tau + \sigma^2}\right) + \frac{\tau s_{wyx,\ell}^2}{2\sigma^2(s_{wxx,\ell} \tau + \sigma^2)}
Expand Down
2 changes: 1 addition & 1 deletion include/stochtree/mainpage.h
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
* - <b>Leaf Model</b>: `stochtree`'s data structures are generalized to support a wide range of models, which are defined via specialized classes in the \ref leaf_model_group "leaf model layer".
* - <b>Sampler</b>: helper functions that sample forests from training data comprise the \ref sampling_group "sampling layer" of `stochtree`.
*
* \section extending-stochtree Extending `stochtree`
* \section extending-stochtree Extending stochtree
*
* \subsection custom-leaf-models Custom Leaf Models
*
Expand Down
7 changes: 4 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ def build_extension(self, ext: CMakeExtension) -> None:

debug = int(os.environ.get("DEBUG", 0)) if self.debug is None else self.debug
cfg = "Debug" if debug else "Release"
use_dbg = "ON" if debug else "OFF"

# CMake lets you override the generator - we need to check this.
# Can be set with Conda-Build, for example.
Expand All @@ -48,8 +49,8 @@ def build_extension(self, ext: CMakeExtension) -> None:
f"-DCMAKE_LIBRARY_OUTPUT_DIRECTORY={extdir}{os.sep}",
f"-DPYTHON_EXECUTABLE={sys.executable}",
f"-DCMAKE_BUILD_TYPE={cfg}", # not used on MSVC, but no harm
"-DUSE_DEBUG=OFF",
"-DUSE_SANITIZER=OFF",
f"-DUSE_DEBUG={use_dbg}",
"-DUSE_SANITIZER=OFF",
"-DBUILD_TEST=OFF",
"-DBUILD_DEBUG_TARGETS=OFF",
"-DBUILD_PYTHON=ON",
Expand Down Expand Up @@ -151,7 +152,7 @@ def run(self):

# The information here can also be placed in setup.cfg - better separation of
# logic and declaration, and simpler if you include description/version in a file.
__version__ = "0.0.1"
__version__ = "0.1.1"

setup(
name="stochtree",
Expand Down
Loading