Insurance is not an easy domain. There's a lot of implicit knowledge, every policy is an edge case, and the margin for error is low. Creating a platform that handles all of this is non-trivial. If your code isn't flexible and robust, you're going to struggle.
Software engineering best practices are mostly about common sense. They try to infuse learnings from other engineering disciplines into software. There are subtle differences, of course: our industry doesn't have the same boundaries (resources, physical constraints) or the same history. These differences make it difficult to have a universal set of best practices. They will vary depending on the industry, company, and team.
At Flock, we've found that the right practices don't just help you move fast. They make fast and reliable compounding. In this post, I'll highlight some of the software engineering practices we've implemented and how they make that possible, including the mistakes we made and the learnings we had.
How We Work
Our teams are small: three engineers per team, plus a Engineering Lead and a Product Manager. At that size, there's no room to hide behind process. Everyone knows what's happening, decisions get made quickly, and there's direct contact between engineering and product. That alone removes a significant amount of friction.
We follow trunk-based development: one main branch, small, focused PRs that merge directly to main and then release to production, with no long-lived feature branches. We keep PRs intentionally small, and that starts at ticket definition, where we break projects and stories into the smallest sensible chunks. Each PR requires minimal context and is easy to review.
Thanks to our microservices architecture (more about this here), two people rarely work on the same code at the same time. Given the atomic nature of our changes, the modularity of the services, and the clearly defined boundaries, we can scope the work for each engineer to be within a specific service (or part of a service). In the past, some boundaries were not clearly defined, and implementing a feature required changes in many places, including areas of the platform where our colleagues were working.
When you merge to main, the diff-based mechanisms in our CI/CD pipeline mean you don't have to wait for the whole repo to build and test if you're only changing a small part of a single service.
Fast Feedback
We have a functional approach, event-driven architecture, and serverless infrastructure, combined with a strong type system. This makes it easy to test at multiple levels (unit, integration, and application tests), and to trust the feedback loop, especially on the backend.
The type system is the first line of defence. With everything strongly typed, a large class of bugs simply can't exist. Unknown or undefined types get caught before they reach runtime.
We previously used io-ts for runtime validation, but as TypeScript evolved and io-ts slowed down, we built our own types library (parsers + generators). That investment also enabled automated type generation across services, critical in a system with as many moving parts as ours.
Writing tests is well embedded in the team culture: all code ships with tests. A functional approach makes this easier: pure functions have no hidden dependencies, so you can test each one in isolation without setting up the world first. Our unit test suite runs in seconds with Vitest. Our type generation also gives us mocks for free for any type defined in a contract. For integration tests, we run a copy of the database schema locally in Docker, so we can test against real database behaviour without touching shared infrastructure.
In the past, we had E2E tests. They worked well for a while, but eventually became a bottleneck: slow and unreliable. After careful analysis, we concluded they added little beyond what our application tests already covered and duplicated logic tested elsewhere, so we removed the suite. We're now adding a new set of E2E tests focused on the true end-to-end lifecycle of a risk (an insurance policy).
Our service architecture follows CQRS (Command Query Responsibility Segregation), separating write operations (commands, domain logic) from read operations, with a clear infrastructure layer underneath. This keeps each layer focused and testable.
When you open a PR, a namespaced service is created in our dev environment and the application tests run against it. A namespaced service is a per-PR deployment of the service in the same environment.
For example, you might have a lambda named policy-document-generation-live. If you make changes to that service, when you create a PR our pipeline creates a namespaced version (e.g. policy-document-generation-pr-0000). During testing, everything exercised for that service is pointed at the namespaced lambda.
Application tests aren't meant to test logic (that's covered by unit and integration tests); they check that inputs and outputs match the expected shapes and that errors are returned correctly. Running them against a real (if namespaced) service catches the category of issues that only appear in a deployed context.
Errors surface quickly, and you know if your change works within minutes.
The Pipeline
Our automated CI/CD pipeline is one of the main reasons we can ship quickly with confidence. We have three environments, on top of the local set-up:
- Development: where we can break things
- Staging: a mimic of production with a separate database, useful for manual testing and full-system checks
- Production: production
Every PR has to pass unit tests, integration tests, application tests, linters, dependency security audits, and formatters before it can be merged. Once approved by at least one engineer, it merges to main. After merging, the checks run again, but only for the parts of the repo that changed, so a change to one service doesn't trigger a rebuild of everything. Then it deploys to staging, and then to production. The time from merged PR to production is usually under 30 minutes, and often less for smaller services.
When something fails mid-pipeline, it stops there. The problem is isolated before it reaches production, and because deploys are frequent and small, rolling back is trivial: it's just reverting a small, known change.
For the frontend, every PR also gets its own namespaced environment with a unique URL. You can access the full platform on your branch without running anything locally. This makes it easy for product and design to review changes without waiting for the changes to be merged with main.
You're deploying constantly, and every deploy has been tested against real infrastructure.
Architecture: Designed for Isolation
Being serverless gives us strong independence between services. The main coupling is in the contracts (the agreed interfaces between services), which requires teams to communicate about changes. We avoid deploying breaking changes without a clear rollout strategy.
In the past, we ran into two kinds of coupling:
- Domain coupling: as the business grew, our domain model changed. Instead of rethinking the domain boundaries, we kept building on top of the existing structure, which increased cross-service dependencies over time.
- "Orchestrators": when a workflow needed data from multiple services, we introduced coordinator services that called other services to gather context and trigger actions. These orchestrators became a coupling hotspot: teams could no longer change services independently without also updating (or accounting for) the orchestrators.
Today, we put a lot of emphasis on maximising service decoupling and have removed that orchestrator layer. Each service has the information it needs to do its job; when it needs more context, we provide it explicitly via events, or via projections (local copies of data built from the platform event stream).
Our event-driven architecture means new functionality can be plugged in, or old functionality switched off, without touching other services. For example, when we refactored our invoices service from Scala to TypeScript, we could disconnect the original service from the event bus and plug the new one in. The domains are separated with clear boundaries, not just on the backend, but also on the frontend, where we've done significant refactoring to make sure teams can work in the same repo without getting in each other's way.
Everything is encapsulated in its own service. Teams can work on them independently without stepping on each other.
Changes stay local and teams don't block each other.
Types and Contracts
In an event-driven architecture with many services, keeping types in sync is a real problem. Without a solution, services drift: one team updates an API, another doesn't notice, and the event payload, the structure of the data each service sends and receives, silently changes shape. Since all events flow through EventBridge, nothing breaks at the transport layer. The receiving service still gets a message; it just gets the wrong one, and the mismatch surfaces as a logic bug rather than a type error.
We solved this by automating the entire process. We define API contracts in OpenAPI files and event shapes in YAML files. Our custom type generation library pulls those definitions from S3 (where we publish them automatically on deploy) and generates TypeScript types directly from the source. Every service gets its types from the same place, and they're always up to date. A useful side effect: the type generation also produces mocks, so unit testing the interface layer doesn't require any extra setup.
We define the contract once. It propagates everywhere it's needed.
Services don't drift apart, and incompatibilities get caught before they reach production.
Observability and Recovery
On the backend, we use Datadog to monitor our AWS services. We've built dashboards around the things that matter most. Dead Letter Queues (DLQs) are a good example: in an event-driven system, messages that fail to process and land in a DLQ are often the first sign that something is wrong. We have alerts set up per team and per environment, so the right people find out immediately when something breaks.
On the product side, we recently migrated from Pendo to PostHog for tracking user behaviour. What's made the biggest difference is how much easier it is to answer specific questions: which features are being used, where users drop off, whether a new prototype is getting traction. Previously, getting that kind of data required a significant amount of manual work. Now it's much faster, which means we can make better-informed decisions about what to build and fix.
When something breaks, you know about it immediately. And because deploys are small and rollbacks are fast, recovery is quick.
Conclusion
Flock has forged its own engineering culture through years of changes, errors, and some wins. The practices listed here are only part of what software engineering means at Flock, but they represent what matters most to us.
The common assumption is that speed and reliability are in tension, that moving fast means accepting more risk. What we've found is the opposite: the practices that make you reliable are largely the same ones that let you move fast. Fast feedback means you don't spend days debugging. Isolated architecture means you're not waiting for other teams. A strong type system means you're not chasing down runtime surprises. Good observability means incidents are short.
This isn't abstract for us, it's what makes big changes and fast iteration possible in practice.
Recently, we did a major refactor of one of our core services. It required careful domain modelling and design, but the integration with the rest of the platform was smooth. Because our services have clear boundaries and communicate through explicit contracts, we could reuse the surrounding capabilities (documents, rebates, claims, and more) as building blocks. Where we needed changes, they were small and incremental, not the kind of "stop the world" refactors that break other teams.
That same foundation matters even more when you start building AI-powered features. AI introduces a different kind of complexity: probabilistic outputs, new failure modes, and behaviour that's harder to reason about. But with strong typing, fast automated feedback, and isolated services, we can ship these features with confidence.
One concrete example is our Claims Experience document extraction pipeline. We integrated a new AI capability (more on this in its own post) that works directly with production data and fits into our existing domain and architecture. The result was a smoother quoting journey and a measurable reduction in quoting time, minutes saved per quote, delivered without destabilising the platform.
Another example is our internal Referrals page. We went from idea to a first usable version in under a week, then iterated continuously based on feedback from internal users, customer signals, and metrics. Because we were building on an existing service, and because our practices (clear boundaries, explicit contracts, strong typing, and fast automated feedback) keep changes safe, we could extend the domain model, publish and consume the right events on the bus, and stand up new UI quickly, with APIs and types ported and generated in the same way as the rest of the system.
At Flock, the engineering mission sits within a bigger one: making the world quantifiably safer. That's not a metaphor. It means the platform has to be accurate, reliable, and able to handle edge cases at scale in a domain that's full of them. The practices described here are how we make sure the software is good enough to actually do that.
About the author
