Build vs buy: should you develop AI safety monitoring in-house?

Sep 26, 2025
7 min read

Updated: Apr 14

As an engineer, I understand the appeal of building things yourself. There's a clarity to owning the entire stack, understanding every component, and having the flexibility to change anything at any time. When your organisation starts exploring computer vision AI for safety, the question "could we build this ourselves?" is a natural one.

The honest answer is: yes, technically you could. But the more useful question is whether you should. And the data strongly suggests that for most organisations, the answer is no.

What the research says about build vs buy

MIT's 2025 GenAI Divide report provides the clearest data point on this question. Across their analysis of 300 public AI deployments, AI tools purchased from specialised vendors succeeded roughly 67% of the time. Internal builds succeeded only about 33% of the time.

That's a 2:1 success ratio in favour of buying. And the gap isn't primarily about capability. As the MIT researchers noted, enterprises were building their own tools almost everywhere they looked. The problem was that purchased solutions, backed by specialised vendor expertise, delivered more reliable results.

This finding aligns with the broader AI pilot data. The RAND Corporation found that over 80% of AI projects fail to reach production. Gartner predicted that 30% of generative AI projects would be abandoned after proof of concept by the end of 2025. And S&P Global found that 42% of companies scrapped most of their AI initiatives in 2025.

The pattern is consistent: building AI in-house is harder, slower, and more expensive than most organisations expect.

What building actually involves

If you're genuinely considering developing a computer vision safety system in-house, it's worth understanding what the engineering scope looks like. I've worked on enough technical projects to know that the gap between "proof of concept" and "production system" is where most ambitions go to die.

Here's what you'd need to build.

A detection engine. This is the core computer vision model that identifies safety events from camera feeds: forklift-pedestrian proximity, exclusion zone breaches, speeding, PPE compliance. You'd need to train, validate, and continuously improve object detection and tracking models specific to industrial environments. Computer vision experts note that data annotation alone (labelling thousands of images to train the model) can account for a larger share of the budget than the model development itself.

Edge processing infrastructure. For safety monitoring, you need real-time processing. Sending live video streams to the cloud introduces latency that makes real-time detection unreliable. That means on-premise hardware: GPU-equipped processing units at each site, network integration with your CCTV system, and the infrastructure to manage it. Development costs for real-time surveillance AI systems typically range from $70,000 to $250,000 or more, before accounting for ongoing maintenance.

A workflow and coaching platform. Detection alone is worthless if the data doesn't reach the right people in a useful format. You'd need to build a platform that presents safety events to supervisors, supports coaching workflows, generates heatmaps and trend reports, and provides dashboards for management review. This is a full software product, not a bolt-on.

Privacy controls. In any environment where cameras are monitoring workers, privacy is non-negotiable. You need face and person blurring, data retention controls, access management, and compliance with privacy regulations. At inviol, we process 99% of data on-premise and are SOC2, ISO 27001, and GDPR compliant. Building equivalent privacy infrastructure from scratch is a significant undertaking.

Ongoing model maintenance. Computer vision models degrade over time as environments change (new racking layouts, different lighting, seasonal variations, new vehicle types). You'd need a team continuously monitoring model performance, retraining on new data, and deploying updates across sites. As Viso.ai notes, many computer vision applications never make it to production because the shift from lab conditions to real-world environments is where it gets challenging and expensive.

Supporting image 1** (place after "What building actually involves")

The hidden costs that catch teams off guard

The initial development cost is rarely the problem. It's the ongoing cost that breaks the business case.

Talent. Computer vision engineers, ML ops specialists, and safety-domain data scientists are expensive and hard to recruit. You're competing with tech companies for the same talent pool. And if a key engineer leaves, their knowledge of your custom system goes with them.

Iteration speed. A specialist vendor has deployed across dozens or hundreds of sites. They've seen every edge case: low lighting, reflective floors, overlapping camera angles, mixed vehicle types, seasonal workforce changes. Each deployment makes the product better for every customer. Your internal build benefits only from your own sites, and your iteration cycle is measured in months rather than the continuous improvement a vendor delivers across their entire customer base.

Opportunity cost. Every month your internal team spends building a safety AI platform is a month they're not working on other priorities. The ASSP's research emphasises that AI adoption in EHS can begin at a small scale with immediate impact. An internal build, by contrast, typically requires nine months or more before reaching production (MIT found that top performers deploy in 90 days while enterprises average nine months or longer).

Compliance burden. If your system processes worker video data, you need to meet privacy and security standards. SOC2 Type II attestation, ISO 27001 certification, and GDPR compliance aren't add-ons; they're requirements for any system handling sensitive operational data. Achieving and maintaining these certifications is expensive and ongoing.

Supporting image 2** (place after "The hidden costs that catch teams off guard")

When building might make sense

I want to be fair to the build argument. There are scenarios where developing in-house capability is justified.

If you have genuinely unique detection requirements that no existing platform supports, and those requirements are so specific that a vendor couldn't reasonably accommodate them, building a custom model for that specific use case may be warranted.

If you already have a mature ML engineering team with experience deploying computer vision models at scale, the build cost and risk is lower than for a team starting from scratch.

For most warehousing, logistics, manufacturing, and retail operations, none of these conditions apply. Safety monitoring is a tool to support your operations, not your core product. The detection requirements (forklift-pedestrian proximity, exclusion zones, speeding, PPE) are well-established use cases that specialist platforms have already solved. And the ML engineering talent required to build and maintain a custom system is disproportionate to the problem.

Supporting image 3** (place before FAQ section)

What buying gives you

When you choose a specialist vendor, you're not just buying software. You're buying the accumulated learning from every deployment that vendor has ever done.

At inviol, that means detection models trained on thousands of hours of real industrial footage across warehousing, logistics, manufacturing, cold storage, and retail environments. It means an on-premise processing architecture where 99% of data stays on-site. It means a coaching and training platform designed specifically for safety teams. It means heatmaps, trend reports, and dashboards built for shift-by-shift analysis and management review. And it means SOC2, ISO 27001, and GDPR compliance already in place.

You also buy deployment speed. Most inviol deployments go live within weeks, using existing CCTV cameras and a selection of cameras in the highest-risk areas rather than requiring full-site coverage. You're generating safety data and coaching insights while an internal build would still be in the data annotation phase.

The compliance dimension

Under the Health and Safety at Work Act 2015, New Zealand PCBUs must identify and manage workplace risks so far as is reasonably practicable. In Australia, WHS legislation imposes similar obligations. In the U.S., OSHA emphasises proactive hazard identification.

The "reasonably practicable" test considers what's available and what's feasible. A proven, commercially available AI safety platform that connects to existing cameras and deploys in weeks is very clearly available and feasible. If an incident occurs and a regulator asks what proactive measures your organisation had in place, "we were building our own system but hadn't finished yet" is not a strong answer.

The engineer's honest assessment

I'm an engineer. I respect the craft of building things well. But I also know that the best engineering decisions aren't about what you can build. They're about where to invest your effort for the greatest return.

For most operations, building a computer vision safety system in-house means spending 12 to 18 months and significant capital to reach a point that a specialist vendor already occupies. During that time, your sites have no AI safety coverage, your safety team has no new data, and your workers have no additional protection.

The pragmatic choice is to buy the platform, deploy it fast, and focus your engineering and operations effort on what you do best: running safe, efficient operations.

If you'd like to see how quickly you could be generating safety data across your sites, book a demo with inviol. We'll walk you through the deployment process, the integration with your existing cameras, and the data you'd see in the first week.

Frequently Asked Questions

What is the success rate for in-house AI builds vs buying from vendors?

MIT's 2025 GenAI Divide report found that AI tools purchased from specialised vendors succeed roughly 67% of the time, while internal builds succeed only about 33%. The gap is primarily driven by vendor expertise, faster deployment, proven implementation patterns, and continuous model improvement across a broader customer base.

How much does it cost to build a computer vision safety system in-house?

Development costs for real-time surveillance and safety AI typically range from $70,000 to $250,000 or more for the initial build. However, the total cost of ownership (including data annotation, infrastructure, talent, ongoing model maintenance, privacy compliance, and platform development) is significantly higher. Computer vision cost experts note that missing infrastructure and data details can lead to underestimates of 70%.

How long does it take to deploy an AI safety monitoring platform?

Specialist vendors like inviol typically deploy within weeks, using existing CCTV cameras and on-premise processing. Internal builds typically take 9 to 18 months to reach production, based on MIT's finding that top performers deploy in 90 days while enterprises average nine months or longer.

When does it make sense to build AI safety monitoring in-house?

Building may be justified if computer vision safety is your core product (not an internal tool), if you have genuinely unique detection requirements that no vendor can accommodate, or if you already have a mature ML engineering team with experience deploying computer vision at scale. For most warehousing, logistics, and manufacturing operations, the detection requirements are well-established use cases that specialist platforms have already solved.

Does inviol work with existing cameras?

Yes. inviol connects to existing CCTV infrastructure and only needs a selection of cameras focused on the highest-risk areas, not every camera on site. The system processes data on-premise (99% stays on-site) and is SOC2, ISO 27001, and GDPR compliant. Most deployments go live within weeks without requiring new camera hardware.

Add a Title