<h1 id="connect">Connect<a aria-hidden="true" class="anchor-heading icon-link" href="#connect"></a></h1>
Kafka Connect is Kafka's integration API and subsystem, and answers questions like:
<ul>
<li>"how do we get data from other systems into our Kafka topics?"
<ul>
<li>answer: source connectors</li>
</ul>
</li>
<li>"how do we get data from our Kafka topics into our other systems (sink)?"
<ul>
<li>answer: sink connectors</li>
</ul>
</li>
</ul>
<img src="/assets/images/2023-06-27-08-49-13.png">
Kafka Connect is an ecosystem of pluggable connectors
<ul>
<li>a connector is simply a <code>.JAR</code> file </li>
</ul>
The job of many source/sink connectors is part of the well trodden path.
<ul>
<li>that is, the code that moves data from a topic to an S3 bucket, from a topic to ElasticSearch, from a topic to records in a relational database is unlikely to vary from one business to the next.</li>
</ul>
Connect abstracts away much of the data integration code, and allows us to write JSON config in its place.
<ul>
<li>ex. the following JSON is how we would stream data from Kafka into ElasticSearch
<ul>
<li>by doing this, we no longer need to write the code that subscribes to a topic, gets messages, and uses the ElasticSearch API</li>
<li>As long as someone has already written an ElasticSearch connector, we can deploy that connector to our Connect cluster and POST the JSON file to the REST endpoint of the Connect cluster. By doing this, the <code>.JAR</code> file that is deployed to the cluster becomes instantiated as a runtime connector.</li>
</ul>
</li>
</ul>
<pre class="language-json"><code class="language-json">{
 "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
 "tasks.max": "1",
 "topics": "simple.elasticsearch.data",
 "name": "simple-elasticsearch-connector",
 "connection.url": "http://elasticsearch:9200",
 "type.name": "_doc"
}
</code></pre>
To a Kafka cluster, Connect looks like a producer or consumer (or both)
Connect runs on hardware that is independent of the Kafka brokers themselves.
Connect is designed to be scalable and fault-tolerant
<ul>
<li>this means we can have a cluster of Connect workers to share the load of moving data in and out of Kafka topics.</li>
</ul>
<h3 id="worker">Worker<a aria-hidden="true" class="anchor-heading icon-link" href="#worker"></a></h3>
A Connect Worker is a node in the Connect cluster. 
The worker runs 1+ Connectors.
<h2 id="resources">Resources<a aria-hidden="true" class="anchor-heading icon-link" href="#resources"></a></h2>
<ul>
<li><a href="https://www.confluent.io/hub/">Confluent hub: list of connectors</a></li>
</ul>

Connect

tech

This Dendron vault of tech knowledge is organized according to domains and their sub-domains, along with specific implementation of those domains.

For instance, Git itself is a domain. Sub-domains of Git would include topics like `commit`,
`tags`, `reflog` etc., while implementations of each of those could be `cli`, `strat`
(strategies), `inner` (inner workings), and so on.

The goal of the wiki is to present data in a manner that is from the perspective
of a querying user. Here, a user is a programmer wanting to get key information
from a specific domain. For instance, if a user wants to use postgres functions
and hasn't done them in a while, they should be able to query
`postgres.functions` to see basic implementations, as well as common patterns
that have been employed in the past.

This wiki has been written with myself in mind. While learning each of these
domains, I have been sensitive to the "aha" moments and have noted down my
insights as they arose. I have refrained from capturing information that I
considered obvious or otherwise non-beneficial to my own understanding.

As a result, I have allowed myself to use potentially arcane concepts to help
explain others. For example, in my note on [[unit testing|testing.method.unit]],
I have made reference to the [[microservices|general.arch.microservice]] note.
The ability to analogize between different concepts captured in different notes
allows an opportunity to build strong generalized understandings. Given that
you'd have to understand microservices to be able to draw that same parallel
that I've already drawn, these links won't work for everyone. Since these notes
are written for myself, I have been fine with taking these liberties and leaning
on them heavily.

What I hope to gain from this wiki is the ability to step away from any
given domain for a long period of time, and be able to be passably useful for
whatever my goals are within a short period of time. Of course this is all
vague sounding, and really depends on the domain along with the ends I am
trying to reach.

To achieve this, the system should be steadfast to:
- be able to put information in relatively easily, without too much thought
	required to its location. While location is important, Dendron makes it easy
	to relocate notes, if it becomes apparent that a different place makes more
	sense.
- be able to extract the information that is needed, meaning there is a
	high-degree in confidence in the location of the information. The idea is
	that information loses a large amount of its value when it is unfindable.
	Therefore, a relatively strict ideology should be used when determining
	where a piece of information belongs.
	- Some concepts might realistically belong to multiple domains. For instance, the concept of *access modifiers* can be found in both `C#` and `Typescript`. Therefore, this note should be abstracted to a common place, such as [[OOP|paradigm.oop]].

This Dendron notebook is the sister vault to the general [Second Brain](https://thoughts.kyletycholiz.com).

## Tags
Throughout the garden, I have made use of tags, which give semantic meaning to the pieces of information.

- `ex.` - Denotes an *example* of the preceding piece of information
- `spec:` - Specifies that the preceding information has some degree of *speculation* to it, and may not be 100% factual. Ideally this gets clarified over time as my understanding develops. I try to go back after I have better understood the topic and clear out the notes of `spec:` tags
- `anal:` - Denotes an *analogy* of the preceding information. When I can, I attempt to link concepts to others that I have previously learned.
- `mn:` - Denotes a *mnemonic*
- `expl:` - Denotes an *explanation*

## Resources
### UE (Unexamined) Resources
Often, I come across sources of information that I believe to be high-quality. They may be recommendations or found in some other way. No matter their origin, I may be in a position where I don't have the time to fully examine them (and properly extract notes), or I may not require the information at that moment in time. In cases like these, I will add reference to a section of the note called **UE Resources**. The idea is that in the future when I am ready to examine them, I have a list of resources that I can start with. This is an alternative strategy to compiling browser bookmarks, which I've found can quickly become untenable.

### E (Examined) Resources
Once a resource has been thoroughly examined and has been mined for notes, it will be moved from *UE Resources* to *E Resources*. This is to indicate that (in my own estimation), there is nothing more to be gained from the resource that is not already in the note.

### Resources
This heading is for inexhaustible resources. 
- A prime example would be a quality website that continually posts articles.  - Another example would be a tool, such as software that measures frequencies in a room to help acoustically treat it.