1st row is header!!!
usually the first row contains the headers
Simplest – Consider each word having an ID – and do one hot encoding…
simples is One hot encoding as we do in Bag of Words !
then we can use word2vec
by finding a way to convert the representation in numbers, for example creating a dictionary of all words and representing a word with its index in that dictionary.
for punctuation and other things we can we can remove as preprocessing steps
This is Off-topic:
In one of the Kaggle project for Image Classification → Evaluation metric was f1_score Macro. So during training do we need to use the same metric (other than accuracy) …if so this is not in pytorch (it’s in sklearn), So how can we use it in GPU based model?
What does exactly mean by non-linearity here, as here we took y = mx + c is a linear equation!!
Home work
- Try ImageIO and TorchVision
- Read hd5py module
- Make a cheat sheet for all handy function used in today’s class
- Think about making a documentation for permute function.
- Time series
Great stuff! Thanks.
Transfer Learning is the way of using already trained models for different data or tasks. Transfer Learning can be applied in several ways. Also known as Model Adaptation.
Fine-Tuning is one of the ways of Transfer Learning only. In this, the already trained neural network is further trained on the new dataset. The benefits are 1. Good Neural-Net architecture to start with 2. Weights Initialization is done using the Pre-Trained weights hence the model converges faster.
Hi, trying to read and catch up… At page 90 of the book, at 4.4.2, the author talks about reshaping the bikes data. It says: We see that the rightmost dimension is the number of columns in the original
dataset. Then, in the middle dimension, we have time, split into chunks of 24 sequential hours. In other words, we now have N sequences of L hours in a day, for C channels. To get to our desired N × C × L ordering, we need to transpose the tensor
My question is, why the author strictly wants the data in N x C x L format, whereas N x L x C seems more natural…Where each row is hourly data and 17 features are in the columns? Isn’t the NxLxC, the ‘normal’ setup? So what does the auther want to achieve to get features into the rows, rather than keeping them in the columns, so that each hour is in the rows as a sepearate data point??
The YouTube video shows Jupyter notebook with more code and tests than what is in the https://github.com/deep-learning-with-pytorch/dlwpt-code, is there a fork or a separate code base for this?
Thanks for checking! No, this was just via the code on there, which notebook did you notice, has a difference? If I’m on and older version that has more context, I can point you to the version then
Based on what I have seen, it honestly seems like a personal preference.
N x C gives a window that shows the data across all features during a particular hour of the day. So, (N, C, 3) would be the data for the 3rd hour of the day.
So, if I wanted to get the average of the temp
for the first date I could do -
daily_bikes[0, 10 , :].mean()
If instead, it was N x L x C, then we look at it differently. N x L would be a window that shows the data for an entire day for a particular feature. So, (N, L, 10) would be the data for the entire day for the temp
feature.
In this case, the average would be -
daily_bikes[0, :, 10].mean()
As long as we are consistent with what operations we perform and on what dimension, I don’t think there’s much difference here. Alludes more to the discussion on Named Tensors in the 3rd chapter.
It probably only matters how it’s shaped when trying to input into a NN. That’s something I have not tried yet, and choosing either of the above could potentially have an impact at that point, I think.
I was getting a bit confused with respect to offset and stride. I request you to share a small example if possible.
I found the following link explaining stride: Pytorch tensor stride - how it works - PyTorch Forums
Sure! I’ll cover this in the beginning of the next call. Thanks for asking