बेवकूफ: एन-अलग मैट्रिक्स में यादृच्छिक रूप से विभाजित/मैट्रिक्स का चयन कैसे करें

मेरे पास आकार (4601, 58) के आकार के साथ एक numpy मैट्रिक्स है।
मैं पंक्तियों की संख्या के आधार पर बेतरतीब ढंग से के रूप में 60%, 20%, 20% विभाजन प्रति मैट्रिक्स विभाजित करना चाहते हैं
यह मैं
जरूरत है वहाँ एक numpy समारोह है कि बेतरतीब ढंग से पंक्तियों का चयन करता है मशीन लर्निंग कार्य के लिए है?

2012-02-01 daydreamer

अगर आप शफ़ल करने के लिए लगातार कई सरणियों एक्स, वाई, एक ही पहला आयाम के साथ जेड चाहते HYRY के जवाब देने के लिए

import numpy as np 

N = 4601 
data = np.arange(N*58).reshape(-1, 58) 
np.random.shuffle(data) 

a = data[:int(N*0.6)] 
b = data[int(N*0.6):int(N*0.8)] 
c = data[int(N*0.8):]

स्रोत

2012-02-01 02:21:43 HYRY

आप बेतरतीब ढंग से पंक्तियों का चयन करना चाहते हैं, तो आप सिर्फ मानक अजगर पुस्तकालय से random.sample इस्तेमाल कर सकते हैं: प्रतिस्थापन के बिना

import random 

population = range(4601) # Your number of rows 
choice = random.sample(population, k) # k being the number of samples you require

random.sample नमूने, तो आप बार-बार पंक्तियों समाप्त बारे में चिंता करने की जरूरत नहीं है choice में। matrix नामक एक सुस्त सरणी को देखते हुए, आप स्लाइसिंग द्वारा पंक्तियों का चयन कर सकते हैं, जैसे: matrix[choice]।

, पाठ्यक्रम, k आबादी में कुल तत्वों की संख्या के बराबर हो सकता है, और फिर choice में आपकी पंक्तियों के सूचकांक का यादृच्छिक क्रम होगा। फिर आप कृपया विभाजन कर सकते हैं, अगर आपको इसकी ज़रूरत है।

स्रोत

2012-02-01 00:49:00

A पूरक numpy.random.shuffle उपयोग कर सकते हैं: x.shape[0] == y.shape[0] == z.shape[0] == n_samples।

आप कर सकते हैं:

rng = np.random.RandomState(42) # reproducible results with a fixed seed 
indices = np.arange(n_samples) 
rng.shuffle(indices) 
x_shuffled = x[indices] 
y_shuffled = y[indices] 
z_shuffled = z[indices]

और फिर HYRY के जवाब में के रूप में प्रत्येक फेरबदल सरणी के विभाजन के साथ आगे बढ़ें।

स्रोत

2012-02-01 08:18:21 ogrisel

जब से तुम मशीन सीखने के लिए इसकी आवश्यकता है, यहाँ एक विधि मैंने लिखा है:

import numpy as np 

def split_random(matrix, percent_train=70, percent_test=15): 
    """ 
    Splits matrix data into randomly ordered sets 
    grouped by provided percentages. 

    Usage: 
    rows = 100 
    columns = 2 
    matrix = np.random.rand(rows, columns) 
    training, testing, validation = \ 
    split_random(matrix, percent_train=80, percent_test=10) 

    percent_validation 10 
    training (80, 2) 
    testing (10, 2) 
    validation (10, 2) 

    Returns: 
    - training_data: percentage_train e.g. 70% 
    - testing_data: percent_test e.g. 15% 
    - validation_data: reminder from 100% e.g. 15% 
    Created by Uki D. Lucas on Feb. 4, 2017 
    """ 

    percent_validation = 100 - percent_train - percent_test 

    if percent_validation < 0: 
     print("Make sure that the provided sum of " + \ 
     "training and testing percentages is equal, " + \ 
     "or less than 100%.") 
     percent_validation = 0 
    else: 
     print("percent_validation", percent_validation) 

    #print(matrix) 
    rows = matrix.shape[0] 
    np.random.shuffle(matrix) 

    end_training = int(rows*percent_train/100)  
    end_testing = end_training + int((rows * percent_test/100)) 

    training = matrix[:end_training] 
    testing = matrix[end_training:end_testing] 
    validation = matrix[end_testing:] 
    return training, testing, validation 

# TEST: 
rows = 100 
columns = 2 
matrix = np.random.rand(rows, columns) 
training, testing, validation = split_random(matrix, percent_train=80, percent_test=10) 

print("training",training.shape) 
print("testing",testing.shape) 
print("validation",validation.shape) 

print(split_random.__doc__)

प्रशिक्षण (80, 2)
परीक्षण (10, 2)
सत्यापन (10, 2)

स्रोत

2017-02-04 19:57:48

बेवकूफ: एन-अलग मैट्रिक्स में यादृच्छिक रूप से विभाजित/मैट्रिक्स का चयन कैसे करें

उत्तर

संबंधित मुद्दे