Immediately after switching the page, it will work with CSR.
Please reload your browser to see how it works.

Source:https://github.com/SoraKumo001/next-streaming

⬅️ Building an open data pipeline in 2024

RadiozRadioz 11 daysReload

> And if you’re dealing with truly massive datasets you can take advantage of GPUs for your data jobs.

I don't think scale is the key deciding factor for whether GPUs are applicable for a given dataset.

I don't think this is a particularly insightful article. Read the first paragraph of the "Cost" section.

amadio 11 daysReload

I take issue with this part of the article:

> In general, managed tools will give you stronger governance and access controls compared to open source solutions. For businesses dealing with sensitive data that requires a robust security model, commercial solutions may be worth investing in, as they can provide an added layer of reassurance and a stronger audit trail.

There are definitely open source solutions capable of managing vast amounts of data securely. The storage group at CERN develops EOS (a distributed filesystem based on the XRootD framework), and CERNBox, which puts a nice web interface on top. See https://github.com/xrootd/xrootd and https://github.com/cern-eos/eos for more information. See also https://techweekstorage.web.cern.ch, a recent event we had along with CS3 at CERN.

victor106 11 daysReload

> Cloudflare R2 (better than AWS S3). the article links to [1]

Is R2 really better than S3?

https://dansdatathoughts.substack.com/p/from-s3-to-r2-an-eco...

esafak 11 daysReload

Can someone explain this "semantic layer" business (cube.dev)? Is it just a signal registry that helps you keep track of and query your ETL pipeline outputs?

Phlogi 11 daysReload

Why do I need sqlmesh if i use dbt/snowflake?