Immediately after switching the page, it will work with CSR.
Please reload your browser to see how it works.

emmanueloga_ 6 daysReload

From the source, Documind appears to:

1) Install tools like Ghostscript, GraphicsMagick, and LibreOffice with a JS script. 2) Convert document pages to Base64 PNGs and send them to OpenAI for data extraction. 3) Use Supabase for unclear reasons.

Some issues with this approach:

* OpenAI may retain and use your data for training, raising privacy concerns [1].

* Dependencies should be managed with Docker or package managers like Nix or Pixi, which are more robust. Example: a tool like Parsr [2] provides a Dockerized pdf-to-json solution, complete with OCR support and an HTTP api.

* GPT-4 vision seems like a costly, error-prone, and unreliable solution, not really suited for extracting data from sensitive docs like invoices, without review.

* Traditional methods (PDF parsers with OCR support) are cheaper, more reliable, and avoid retention risks for this particular use case. Although these tools do require some plumbing... probably LLMs can really help with that!

While there are plenty of tools for structured data extraction, I think there’s still room for a streamlined, all-in-one solution. This gap likely explains the abundance of closed-source commercial options tackling this very challenge.

---

1: https://platform.openai.com/docs/models#how-we-use-your-data

2: https://github.com/axa-group/Parsr

vunderba 6 daysReload

OP, you've been accused of literally ripping off somebody's more popular repository and posing it as your own.

https://news.ycombinator.com/item?id=42178413

You may wanna get ahead of this because the evidence is fairly damning. Failing to even give credit to the original project is a pretty gross move.

infecto 6 daysReload

Multimodal LLM are not the way to do this for a business workflow yet.

In my experience your much better of starting with a Azure Doc Intelligence or AWS Textract to first get the structure of the document (PDF). These tools are incredibly robust and do a great job with most of the common cases you can throw at it. From there you can use an LLM to interrogate and structure the data to your hearts delight.

bob778 6 daysReload

From just reading the README, the example is not valid JSON. Is that intentional?

Otherwise it seems like a prompt building tool, or am I missing something here?

danbruc 6 daysReload

With such a system, how do you ensure that the extracted data matches the data in the source document? Run the process several times and check that the results are identical? Can it reject inputs for manual processing? Or is it intended to be always checked manually? How good is it, how many errors does it make, say per million extracted values?