<h1 id="pytorch">Pytorch<a aria-hidden="true" class="anchor-heading icon-link" href="#pytorch"></a></h1>
<h2 id="data-primitives">Data Primitives<a aria-hidden="true" class="anchor-heading icon-link" href="#data-primitives"></a></h2>
<p>PyTorch has two primitives to work with data: <code>torch.utils.data.DataLoader</code> and <code>torch.utils.data.Dataset</code></p>
<h3 id="dataset">Dataset<a aria-hidden="true" class="anchor-heading icon-link" href="#dataset"></a></h3>
<p><code>Dataset</code> represents a map between key (label) and sample (features) pairs of your data.</p>
<ul>
<li>ex. images and their associated labels</li>
</ul>
<h3 id="dataloader">DataLoader<a aria-hidden="true" class="anchor-heading icon-link" href="#dataloader"></a></h3>
<p><code>DataLoader</code> wraps an iterable around the <code>Dataset</code></p>
<ul>
<li>this is accomplished by passing <code>Dataset</code> as an arg to <code>DataLoader</code>, giving us automatic batching, sampling, shuffling and multiprocess data loading.</li>
</ul>
<h2 id="domain-specific-libraries">Domain-specific Libraries<a aria-hidden="true" class="anchor-heading icon-link" href="#domain-specific-libraries"></a></h2>
<ul>
<li>TorchText </li>
<li>TorchVision</li>
<li>TorchAudio</li>
<li>TorchRec - Recommendation Systems</li>
</ul>
<h3 id="torchvision">TorchVision<a aria-hidden="true" class="anchor-heading icon-link" href="#torchvision"></a></h3>
<p>Every TorchVision Dataset includes two arguments: <code>transform</code> and <code>target_transform</code> to modify the samples and labels respectively.</p>
<hr>
<h2 id="reproducibility">Reproducibility<a aria-hidden="true" class="anchor-heading icon-link" href="#reproducibility"></a></h2>
<p>Generating random numbers is an <a href="/notes/k4ib1hhlkmzcrfx8vljz7sg#how-a-neural-network-learns">essential aspect</a> to a deep learning model. However, this presents a problem: if we are generating random numbers, how can we reproduce the same results across different executions of the code and across different machines, given the same data and parameters?</p>
<p>There are a couple of things we can do to limit the amount of nondeterministic behaviour from Pytorch:</p>
<ol>
<li>control sources of randomness that can cause multiple executions of your application to behave differently</li>
<li>configure PyTorch to avoid using nondeterministic algorithms for some operations. As a result, multiple calls to those operations, given the same inputs, will produce the same result.
<ul>
<li>this involves specifying the seed so that the <a href="/notes/5jswj4f1an4wt4970pq66tv">PRNG</a> will produce the same sequence of "random" numbers, even run across different machines.</li>
</ul>
</li>
</ol>
<hr>
<h3 id="torch-hub--torchvisionmodels">Torch Hub / <code>torchvision.models</code><a aria-hidden="true" class="anchor-heading icon-link" href="#torch-hub--torchvisionmodels"></a></h3>
<p>Allows us to access many pre-built deep learning models (allowing us to leverage <a href="/notes/03o3n0hz9v9jtb7j889zpd4#transfer-learning">transfer learning</a>)</p>
<h2 id="resources">Resources<a aria-hidden="true" class="anchor-heading icon-link" href="#resources"></a></h2>
<ul>
<li><a href="https://pytorch.org/tutorials/beginner/ptcheat.html">Pytorch Cheatsheet</a></li>
</ul>
<h2 id="learning-resources">Learning Resources<a aria-hidden="true" class="anchor-heading icon-link" href="#learning-resources"></a></h2>
<ul>
<li><a href="https://www.dataquest.io/blog/pytorch-for-beginners/">https://www.dataquest.io/blog/pytorch-for-beginners/</a></li>
<li><a href="https://youtu.be/Z_ikDlimN6A?si=FX6o8eF3Xh6fbqnA&#x26;t=23078">Learn PyTorch for deep learning in a day. Literally</a>
<ul>
<li><a href="https://www.learnpytorch.io/">Accompanying notes</a></li>
</ul>
</li>
</ul>
<hr>
<strong>Children</strong>
<ol>
<li><a href="/notes/elq5ymv90d7r0hzxv8kwmbr">Ap</a></li>
<li><a href="/notes/8qm42jxtuar5qn15w7nphwy">Tensors</a></li>
</ol>
<hr>
<strong>Backlinks</strong>
<ul>
<li><a href="/notes/v3rh8rqc5aorupfno1w4otw">Functions</a></li>
</ul>

Pytorch

tech

This Dendron vault of tech knowledge is organized according to domains and their sub-domains, along with specific implementation of those domains.

For instance, Git itself is a domain. Sub-domains of Git would include topics like `commit`,
`tags`, `reflog` etc., while implementations of each of those could be `cli`, `strat`
(strategies), `inner` (inner workings), and so on.

The goal of the wiki is to present data in a manner that is from the perspective
of a querying user. Here, a user is a programmer wanting to get key information
from a specific domain. For instance, if a user wants to use postgres functions
and hasn't done them in a while, they should be able to query
`postgres.functions` to see basic implementations, as well as common patterns
that have been employed in the past.

This wiki has been written with myself in mind. While learning each of these
domains, I have been sensitive to the "aha" moments and have noted down my
insights as they arose. I have refrained from capturing information that I
considered obvious or otherwise non-beneficial to my own understanding.

As a result, I have allowed myself to use potentially arcane concepts to help
explain others. For example, in my note on [[unit testing|testing.method.unit]],
I have made reference to the [[microservices|general.arch.microservice]] note.
The ability to analogize between different concepts captured in different notes
allows an opportunity to build strong generalized understandings. Given that
you'd have to understand microservices to be able to draw that same parallel
that I've already drawn, these links won't work for everyone. Since these notes
are written for myself, I have been fine with taking these liberties and leaning
on them heavily.

What I hope to gain from this wiki is the ability to step away from any
given domain for a long period of time, and be able to be passably useful for
whatever my goals are within a short period of time. Of course this is all
vague sounding, and really depends on the domain along with the ends I am
trying to reach.

To achieve this, the system should be steadfast to:
- be able to put information in relatively easily, without too much thought
	required to its location. While location is important, Dendron makes it easy
	to relocate notes, if it becomes apparent that a different place makes more
	sense.
- be able to extract the information that is needed, meaning there is a
	high-degree in confidence in the location of the information. The idea is
	that information loses a large amount of its value when it is unfindable.
	Therefore, a relatively strict ideology should be used when determining
	where a piece of information belongs.
	- Some concepts might realistically belong to multiple domains. For instance, the concept of *access modifiers* can be found in both `C#` and `Typescript`. Therefore, this note should be abstracted to a common place, such as [[OOP|paradigm.oop]].

This Dendron notebook is the sister vault to the general [Second Brain](https://thoughts.kyletycholiz.com).

## Tags
Throughout the garden, I have made use of tags, which give semantic meaning to the pieces of information.

- `ex.` - Denotes an *example* of the preceding piece of information
- `spec:` - Specifies that the preceding information has some degree of *speculation* to it, and may not be 100% factual. Ideally this gets clarified over time as my understanding develops. I try to go back after I have better understood the topic and clear out the notes of `spec:` tags
- `anal:` - Denotes an *analogy* of the preceding information. When I can, I attempt to link concepts to others that I have previously learned.
- `mn:` - Denotes a *mnemonic*
- `expl:` - Denotes an *explanation*

## Resources
### UE (Unexamined) Resources
Often, I come across sources of information that I believe to be high-quality. They may be recommendations or found in some other way. No matter their origin, I may be in a position where I don't have the time to fully examine them (and properly extract notes), or I may not require the information at that moment in time. In cases like these, I will add reference to a section of the note called **UE Resources**. The idea is that in the future when I am ready to examine them, I have a list of resources that I can start with. This is an alternative strategy to compiling browser bookmarks, which I've found can quickly become untenable.

### E (Examined) Resources
Once a resource has been thoroughly examined and has been mined for notes, it will be moved from *UE Resources* to *E Resources*. This is to indicate that (in my own estimation), there is nothing more to be gained from the resource that is not already in the note.

### Resources
This heading is for inexhaustible resources. 
- A prime example would be a quality website that continually posts articles.  - Another example would be a tool, such as software that measures frequencies in a room to help acoustically treat it.


Pytorch

Data Primitives

Dataset

DataLoader

Domain-specific Libraries

TorchVision

Reproducibility

Torch Hub / torchvision.models

Resources

Learning Resources

Torch Hub / `torchvision.models`