Immediately after switching the page, it will work with CSR.
Please reload your browser to see how it works.

Source:https://github.com/SoraKumo001/next-streaming

⬅️ “Streaming vs. Batch” Is a Wrong Dichotomy, and I Think It's Confusing

fifilura 3 hoursReload

"Try it yourself" "very quickly wanted to get real-time streaming for more"

My experience is the opposite.

You think you need streaming, so you "try it out" and build something incredibly complex with Kafka, that needs 24h maintenance to monitor congestion in every pipeline.

And 10x more expensive because your servers are always up.

And some clever (expensive) engineers that figure out how watermarks, out of orderness and streaming joins really work and how you can implement them in a parallel way without SQL.

And of course a renovate bot to upgrade your fancy (but half baked) framework (flink) to the latest version.

And you want to tune your logic? Luckily that last 3 hours of data is stored in Kafka so all you have to do is reset all consumer offsets, clean your pipelines and restart your job and the in data will hopefully be almost the same as last time you run it. (Compared to changing a parameter and re-running that SQL query).

When all you business case really needed was a monthly report. And that you can achieve with pub/sub and an SQL query.

In my experience the need for live data rarely comes from a business case, but for a want to see your data live.

And if it indeed comes from a business case, you are still better off prototyping with something simple and see if it really flies before you "try it out".

efitz 8 hoursReload

Batch processes IRL tend to “fetch” data, which is nothing like streaming.

For example, I’ve worked with “batch” systems that periodically go do fetches from databases or S3 buckets and then do lots of crunching, before storing the results.

Sometimes batch systems have separate fetchers and only operate vs a local store; they’re still batch.

Streaming systems may have local aggregation or clumping in the arriving information; that doesn’t make it a “batch” system. Likewise streaming systems may process more than one work item simultaneously; still not a “batch”.

I associate “batch” more with “schedule” or “periodic” and “fetch”; I associate “stream” with “continuous” and “receiver”.

floating-io 8 hoursReload

From skimming the article, it seems that this is a munging of the terms in directions that just aren't meaningful.

I've had the following view from the beginning:

- Batches are groups of data with a finite size, delivered at whatever interval you desire (this can be seconds, minutes, hours, days, or years between batches).

- Streaming is when you deliver the data "live", meaning immediately upon generation of that data. There is no defined start or end. There is no buffering or grouping at the transmitter of that data. It's constant. What you do with that data after you receive it (buffering, batching it up, ...) is irrelevant.

JMHO.

10000truths 3 hoursReload

"latency" and "throughput" are only mentioned in passing in the article, but that is really the crux of the whole "streaming vs. batch" thing. You can implement a stream-like thing with small, frequent batches of data, and you can implement a batch-like thing with a large-buffered stream that is infrequently flushed. What matters is how much you prioritize latency over throughput, or vice versa. More importantly, this can be quantified - multiply latency and throughput, and you get buffer/batch size. Congratulations, you've stumbled across Little's Law, one of the fundamental tenets of queuing theory!

binoct 5 hoursReload

The comments here are really interesting to read since there are so many strongly stated different definitions. It’s obvious “steaming” and “batch” have different implications and even meanings in different contexts. Depending on what the type of work being done and what system it’s being done with, batch and streaming can be interpreted differently, so it feels like really a semantic argument going on lacking specificity. It’s important to have common and clear terminology, and across the industry these words (like so many in computer science) are not always as clear as we might assume. Part of what makes naming things so difficult.

It does seem to me that push vs pull are slightly more standardized in usage, which might be what the author is getting at. But even then depending on what level of abstraction in the system you are concerned with the concepts can flip.