UnconGen - Train / Val / Test Split

Hi, I had a question with regard to this section of the `dataprovider_pypots.py` more specifically this part : 

```
    # --- 3. Split Data and Time Info ---
    idx_train, idx_val, idx_test = make_split_indices(ori_data.shape[0], train_ratio, val_ratio, test_ratio)
    
    train_set_X, train_set_time = ori_data[idx_train], time_info[idx_train]
    val_set_X, val_set_time = ori_data[idx_val], time_info[idx_val]
    test_set_X, test_set_time = ori_data[idx_test], time_info[idx_test]    

    # --- 4. Apply Sliding Window to both Features and Time ---
    train_X = sliding_window(train_set_X, seq_len, stride)
    val_X = sliding_window(val_set_X, seq_len, stride)
    test_X = sliding_window(test_set_X, seq_len, stride)
    
    time_info_train = sliding_window(train_set_time, seq_len, stride)
    time_info_val = sliding_window(val_set_time, seq_len, stride)
    time_info_test = sliding_window(test_set_time, seq_len, stride)
```

My understanding here is that the sequence is being shuffled and then randomly split into a train / validation / test set before any windowing is done. 
However wouldn't this lead the created sliding window to no longer match the real sequences, especially considering the following points:
1. Steps are no longer sorted appropriately 
2. Sequences have now gaps within where the next "step" in the train sequence can randomly end up in the validation or test. 

Can you confirm if my understanding is correct and if so how / if these concerns are addressed by the modelling ? 

Thanks for all your work, very helpful !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnconGen - Train / Val / Test Split #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

UnconGen - Train / Val / Test Split #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions