That’s awesome! Great to see you in the group and looking forward to being part of your journey.
Feel free to introduce yourself in #start-here if you’d like
That’s awesome! Great to see you in the group and looking forward to being part of your journey.
Feel free to introduce yourself in #start-here if you’d like
Am a Paul, a Msc Data Science and Analytics in the UK and have strong interest towards AI and ML
LinkedIn: Paul Ntalo - Jr Machine Learning Engineer - Omdena | LinkedIn
As there is a discussion about image augmentation, last year I read this article which discussed about few interesting points.
Specially about Pitfall 3: When comparing humans and machines, experimental conditions should be equivalent
Hi Guys, I have a doubt. I read in Tensors chapter (Chapter 3) that if I explicitly create tensors in my GPU, pytorch will create the tensor in GPU memory and not my RAM. Now I am not sure why I am getting same id when I created a tensor in GPU and copied it to CPU and checked id of the first element in storage. Can someone help me understand this? Thanks in advance .
Using id()
can be tricky for different reasons. But an object in python will return a unique number, when you use id()
, for only the lifetime of that object.
When you use -
id(points_gpu.storage())
you get a number. The reference to that object is then deleted. It doesn’t exist beyond that.
When you next try to do something like -
id(points_cpu.storage())
then, internally, the memory management sees that there is an empty place in memory - the location that the previous number pointed to. So, it just uses that same location and returns the same number.
If it didn’t do that, you could run id(something)
too many times just taking up memory.
However, if you did something like -
t1 = points_gpu.storage()
print(id(t1))
t2 = points_cpu.storage()
print(id(t2))
print(id(t1))
Now, you would likely get different numbers that would persist because your object is being stored in memory. If you have worked with Pandas, this is probably similar to when you might do something like -
a.dropna()
a.dropna(inplace=True)
a = a.dropna()
The first one is not going to make any change to a
. It would show us the output, but a
is not affected. The 2nd and 3rd ones allow for the change to persist in a
.
For PyTorch, data_ptr()
can be used which returns the address of the first element of the tensor.
So, if you did -
points_gpu.storage().data_ptr()
points_cpu.storage().data_ptr()
You would get different numbers for both. When I ran it I got something like -
30104616960
2068614157952
The above makes sense to me because the 2nd one is longer and corresponds to the CPU so I would expect it to have more memory locations than the GPU one. So, your tensors are being created and used as expected based on that. Using id()
might not be the best option, in this case, to check for that because of however Python/CPython works with/defines it internally.
Note: Everything I shared above is likely a high overview of how it actually works (and I could have made a mistake). I tried to spent more time than I should have to understand this. But after a point, this just becomes very convoluted and some sources even said all of this is not something most people (including programmers) would have to bother with.
Interesting question, though!
Many thanks for taking the time out to write this. This is a great answer - very insightful.
Thank you for the session. There is a lot to code and practice
I wanted to know if you could share how to automate the process of data processing for a local datat set?