Equivalent of scikit-learn's GroupShuffleSplit in dask-ml?
up vote
1
down vote
favorite
I'd like to split my data into testing and training sets, but I have repeated observations of people over time, so I'd like to do the splitting in a way that none of the people have observations that appear in both the test and training data sets. To do this kind of splitting in scikit-learn, I'd do something like this, using GroupShuffleSplit:
import numpy as np
from sklearn.model_selection import GroupShuffleSplit
X = np.array([0.1, 0.2, 2.2, 2.4, 2.3, 4.55, 5.8, 0.001])
y = np.array(["a", "b", "b", "b", "c", "c", "c", "a"])
groups = np.array([1, 1, 2, 2, 3, 3, 4, 4])
gss = GroupShuffleSplit(n_splits=1, test_size=0.2, random_state=0)
train, test = next(gss.split(X, y, groups=groups))
X_train, y_train = X[train], y[train]
X_test, y_test = X[test], y[test]
How can I do this with Dask or Dask-ML?
python scikit-learn dask panel-data dask-ml
add a comment |
up vote
1
down vote
favorite
I'd like to split my data into testing and training sets, but I have repeated observations of people over time, so I'd like to do the splitting in a way that none of the people have observations that appear in both the test and training data sets. To do this kind of splitting in scikit-learn, I'd do something like this, using GroupShuffleSplit:
import numpy as np
from sklearn.model_selection import GroupShuffleSplit
X = np.array([0.1, 0.2, 2.2, 2.4, 2.3, 4.55, 5.8, 0.001])
y = np.array(["a", "b", "b", "b", "c", "c", "c", "a"])
groups = np.array([1, 1, 2, 2, 3, 3, 4, 4])
gss = GroupShuffleSplit(n_splits=1, test_size=0.2, random_state=0)
train, test = next(gss.split(X, y, groups=groups))
X_train, y_train = X[train], y[train]
X_test, y_test = X[test], y[test]
How can I do this with Dask or Dask-ML?
python scikit-learn dask panel-data dask-ml
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I'd like to split my data into testing and training sets, but I have repeated observations of people over time, so I'd like to do the splitting in a way that none of the people have observations that appear in both the test and training data sets. To do this kind of splitting in scikit-learn, I'd do something like this, using GroupShuffleSplit:
import numpy as np
from sklearn.model_selection import GroupShuffleSplit
X = np.array([0.1, 0.2, 2.2, 2.4, 2.3, 4.55, 5.8, 0.001])
y = np.array(["a", "b", "b", "b", "c", "c", "c", "a"])
groups = np.array([1, 1, 2, 2, 3, 3, 4, 4])
gss = GroupShuffleSplit(n_splits=1, test_size=0.2, random_state=0)
train, test = next(gss.split(X, y, groups=groups))
X_train, y_train = X[train], y[train]
X_test, y_test = X[test], y[test]
How can I do this with Dask or Dask-ML?
python scikit-learn dask panel-data dask-ml
I'd like to split my data into testing and training sets, but I have repeated observations of people over time, so I'd like to do the splitting in a way that none of the people have observations that appear in both the test and training data sets. To do this kind of splitting in scikit-learn, I'd do something like this, using GroupShuffleSplit:
import numpy as np
from sklearn.model_selection import GroupShuffleSplit
X = np.array([0.1, 0.2, 2.2, 2.4, 2.3, 4.55, 5.8, 0.001])
y = np.array(["a", "b", "b", "b", "c", "c", "c", "a"])
groups = np.array([1, 1, 2, 2, 3, 3, 4, 4])
gss = GroupShuffleSplit(n_splits=1, test_size=0.2, random_state=0)
train, test = next(gss.split(X, y, groups=groups))
X_train, y_train = X[train], y[train]
X_test, y_test = X[test], y[test]
How can I do this with Dask or Dask-ML?
python scikit-learn dask panel-data dask-ml
python scikit-learn dask panel-data dask-ml
asked Nov 18 at 2:56
karldw
506
506
add a comment |
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53357485%2fequivalent-of-scikit-learns-groupshufflesplit-in-dask-ml%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown