From hackathon prototype to production at Flock

At Flock, some of the most interesting problems don't look that interesting at first.

This one started with a simple idea:

Can we extract structured data from a PDF?

Specifically, Confirmed Claims Experience (CCE) documents - reports that underwriters rely on to assess risk. These documents are essential, but working with them is slow, repetitive, and highly manual. Someone has to open the file, interpret tables, map columns, and input everything into a system.

On the surface, it sounds like a straightforward extraction problem.

In reality, it's anything but.

Documents vary significantly between insurers. Some are well-structured tables, others are loosely formatted, and many are scanned images. The same data might appear under different labels, and even within a single document, structure can shift.

Very quickly, it became clear:

This wasn't just about extracting text - it was about understanding documents the way a human does.

Underwriting assistants no longer extract CCE data by hand for new business quotes - the agent handles it. What used to take around 20 minutes per quote now happens automatically, saving underwriters roughly 3–4 hours each week on average.

More importantly, it shifted how we think about automation. This wasn't just a one-off improvement - it proved that agentic workflows can reliably take on real operational tasks. Since shipping this, it's opened up ongoing conversations across the team about what else we can automate, with areas like vehicle-level processing already being explored next.

Where it started

This project began as a hackathon experiment.

The initial version used Amazon Bedrock to process CCE PDFs and output structured data. It worked well enough on simple examples, but quickly broke down when faced with real-world variability.

The turning point came when we reframed the problem.

Instead of asking "how do we parse this document?", we started asking:

How would an underwriter approach this?

That shift changed how we designed everything that followed.

Learning from underwriters

Rather than building in isolation, we worked closely with underwriting assistants - the people who process these documents every day.

We spent time understanding how they read documents, how they interpret tables, and where they spend the most effort. What looks like intuition is actually a fairly structured process once you break it down.

We also introduced a feedback loop early. After shipping an initial version, underwriters flagged issues - incorrect mappings, missing data, inconsistencies. We tracked these, identified patterns, and iterated on the system accordingly. One of the first common issues was extracting the insurer name, which often appeared as a company logo. This simply required updating the prompt to look at any logos as an indication of the insurer name.

Some fixes were prompt changes. Others required adjustments in how we extracted or validated data. Over time, this loop significantly improved accuracy and reliability.

System overview

At a high level, the system is a pipeline that combines deterministic processing with AI-based extraction.

Here's how it works end-to-end:

Rendering diagram...

The system can be triggered in two ways: via an API request, or automatically when a document is uploaded. Both routes feed into the same processing pipeline, ensuring consistent behaviour.

Each document is processed individually, then the results are aggregated, cleaned, and sent to downstream systems.

How a single document is processed

The core of the system is how we handle each individual PDF.

The sequence looks like this:

Rendering diagram...

For each document, we:

Download the PDF
Extract text and structure
Build context for the model
Use AI to extract structured data
Validate and filter results

This separation of steps is intentional. It keeps the system modular and easier to reason about.

Handling different types of PDFs

One of the first challenges we encountered was that not all PDFs behave the same way. Some contain extractable text. Others are effectively images.

To handle this, we implemented a simple routing approach:

python

def extract_text(pdf_path: str):
    sample = extract_first_pages(pdf_path)
 
    if len(sample.strip()) > TEXT_THRESHOLD:
        return extract_with_pdfplumber(pdf_path)
    else:
        return extract_with_textract(pdf_path)

If the document contains usable text, we process it directly. If not, we fall back to OCR.

This approach keeps the system efficient for the common case, while still handling more complex inputs reliably.

From text to structured data

Extracting text is only part of the problem. The real challenge is interpreting it correctly.

CCE documents contain tables with multiple claim types, different time periods, and varying column structures.

Rather than relying entirely on rigid parsing logic, we use an AI model to perform structured extraction:

python

result = await agent.structured_output_async(
    prompt=prompt,
    output_model=ClaimsDataList
)
 
claims = result.claims

The model is given carefully prepared inputs - including table content and contextual information - and is required to return data that fits a predefined schema.

This combination of structured input and constrained output allows us to handle variability without losing control over the results.

Dealing with messy data

Real-world data introduces additional complexity. We frequently encountered duplicate records across multiple documents, partially complete rows, and inconsistent formatting.

To handle this, we introduced simple but effective strategies.

For example, when multiple records exist for the same year, we keep the most complete one:

python

def completeness_score(claim):
    return (
        claim.vehicle_years +
        claim.claims_reported +
        claim.total_paid_and_outstanding +
        claim.adft_excess +
        claim.ws_excess
    )
 
best = max(year_claims, key=completeness_score)

We also apply validation rules to filter out incomplete or irrelevant data before passing it downstream.

From prototype to production

To make this usable in production, we integrated the pipeline into our platform.

Documents uploaded to S3 automatically trigger processing. The system runs in AWS Lambda, extracts and structures the data, and updates the Quotes system with the results.

Under the hood, the stack is intentionally simple and built around managed services:

AWS S3 for document storage and event triggers
AWS Lambda for running the processing pipeline
Amazon Bedrock (Claude 3 Haiku) for structured extraction
AWS Textract for OCR on scanned PDFs
pdfplumber for fast extraction from text-based PDFs
AWS CDK for defining and deploying infrastructure

The system itself is split across Python and TypeScript, with Python handling the extraction and agent pipeline, and TypeScript used for surrounding platform services and infrastructure.

The overall architecture is serverless, event-driven, and modular. This keeps the system scalable, cost-efficient, and relatively simple to operate - while still being flexible enough to evolve as we add new document types and workflows.

What we learned

A few things became clear as we built this system.

One of the biggest surprises was that the biggest improvements didn't come from changing models, but from improving how we structured the problem for the model.

Early on, we treated the model like a general-purpose extractor - passing in large chunks of raw text and expecting it to infer structure correctly. This worked inconsistently, especially on documents with multiple tables, ambiguous column layouts, or missing context.

What made a much bigger difference was tightening both the inputs and the instructions.

On the input side, we stopped sending raw text and instead reshaped it into something closer to how a human would read the document. That meant extracting and isolating the relevant sections - structuring table data, separating claims content from excess values, and providing clearer document context. By the time the model saw the input, much of the ambiguity had already been removed.

On the instruction side, being explicit mattered more than expected. Providing concrete examples - particularly for how to extract things like excess values - significantly improved accuracy. Just as importantly, we added rules for what not to do. For example, explicitly telling the model to ignore rows with empty or incomplete values prevented it from attempting to "fill in the gaps," which often led to invalid outputs.

We also saw that many issues we initially attributed to the model were actually surfaced through validation. Using Pydantic during development helped catch inconsistencies early and forced us to confront edge cases we might have otherwise missed. In practice, documents representing the same concept could look completely different - excess values might appear in a dedicated table, or just as a single line elsewhere in the document. Validation failures made these differences visible and pushed us to handle them more deliberately.

Taken together, these changes reduced a whole class of errors - from incorrect column mappings and duplicated periods, to invalid rows being treated as real data.

The takeaway for us was that model performance is often limited less by the model itself, and more by how clearly you define the task, structure the input, and constrain the output.

What this changed for underwriters

The biggest signal for us has been direct feedback from underwriting assistants who use this workflow day to day.

The CCE Agent Extractor has been a game changer in speeding up our new business logging process. It was previously one of the most time-consuming, labour-intensive parts of the workflow, but with this new feature, manual keying has been replaced with a simple drag-and-drop and instant upload. It's great having such a skilled tech team on hand to collaborate with and build innovative features like this one.

Anna Spriggs

This has genuinely changed how we work! Instead of spending time manually extracting and inputting data from CCE documents, we can now focus on reviewing and validating the output. It’s made the process faster, more consistent, and significantly reduced the risk of human error.

Shreya Chowta

From hackathon prototype to production at Flock

Where it started#

Learning from underwriters#

System overview#

How a single document is processed#

Handling different types of PDFs#

From text to structured data#

Dealing with messy data#

From prototype to production#

What we learned#

What this changed for underwriters#

Want to work on problems like these?

Where it started

Learning from underwriters

System overview

How a single document is processed

Handling different types of PDFs

From text to structured data

Dealing with messy data

From prototype to production

What we learned

What this changed for underwriters