Maritime Multi-Object Tracking: How AI Identifies Every Object on the Water

Detection tells you what is on the water. Multi-Object Tracking tells you what each contact is, where it came from, and where it is going — for every object in the scene, simultaneously. That is the difference between a snapshot and operational intelligence.

A single camera frame tells you what is on the water. Multi-Object Tracking (MOT) answers a harder question: what each object is, where it came from, and where it is going. It does this for every contact in the scene at once, in real time.

That distinction matters at sea. Knowing a vessel is present is not the same as knowing whether it is moving toward you, holding a steady course, or behaving erratically. MOT is what transforms raw detections into a persistent, evolving model of the maritime environment.

From Detection to Tracking: Adding the Dimension of Time

Object detection identifies things in a single image frame. A vessel here, a buoy there. But detection is stateless. Every frame starts from zero, with no memory of what came before.

Tracking adds the dimension of time. It connects detections across frames, assigning each contact a persistent identity and building a continuous motion history. Multi-Object Tracking does this for every object simultaneously.

The output is not a list of detections. It is a live scene model: who is present, how long each contact has been tracked, how it has moved, and what its trajectory suggests about its next position.

Multi-Object Tracking Explanation

The Predict-Associate Loop

MOT repeats two steps on every frame: predict, then associate.

The Kalman Filter: Estimating State Under Uncertainty

Between frames, the tracker does not wait passively for the next image. It actively estimates where each tracked object should be right now.

This is where the Kalman filter comes in. A Kalman filter maintains a probabilistic belief about each object’s state: position, bearing, and size. It also maintains a measure of uncertainty in that belief. On every time step, it propagates the state estimate forward using the object’s motion model.

The prediction will not be exact. But it gives the association step a strong prior on where to look. That prior is what makes fast, reliable matching possible even in cluttered or noisy conditions.

Association: Solving the Assignment Problem

When the next frame arrives with fresh detections, the tracker must solve the association problem: which detection belongs to which existing track?

The system scores every new detection against every predicted track position. It then finds the globally optimal assignment across the entire scene.

Detections that match no existing track initialise a new one. The system retires tracks that go unmatched for too long.
This constant cycle of creating, updating, and pruning keeps the scene model accurate and current across every frame.

The Challenge of Tracking on Open Water

The open sea is one of the hardest environments for any tracking system. There are no lane markings, no fixed reference points, and no constraints on how objects move.

Targets range from 300-metre tankers to a person in the water: a scale difference of over 100 times. Lighting swings from blinding sun glare to near-total darkness. Waves create constant visual noise, and targets at distance can occupy just a few pixels.

Most tracking systems operate in pixel space. They follow objects based on where they appear in the image. But pixel displacement does not map cleanly to real-world motion. A distant vessel shifting five pixels may have travelled hundreds of metres. The same shift for a nearby object is almost meaningless.

Sea-Plane Coordinates: Tracking in the Physical World

SEA.AI’s tracker takes a different approach. It reasons in what SEA.AI calls sea-plane coordinates: each object’s real-world bearing and distance from the sensor, not its position on the image plane.

This eliminates the scale problem. The tracker models how objects move in physical space, not how their pixel positions change on a sensor. The tracker handles a vessel 2 km out just as reliably as one 200 metres away.

The failure mode that undermines pixel-based trackers at range simply does not apply.
For more on how object detection works before tracking begins, see AI Object Detection at Sea. SEA.AI’s Brain extends this detection capability to third-party camera systems already installed on board.

Compensating for Camera Motion: The PTU Challenge

Some SEA.AI products equip a Pan-Tilt Unit (PTU): a motorised camera mount that can pan a full 360 degrees and tilt continuously. This gives wide-area coverage. But it introduces a fundamental tracking challenge.

When an object appears to shift position between two frames, the tracker must answer a critical question: did the object move, or did the camera? From the raw image alone, the two are indistinguishable. Without separating them, every camera pan would cause the tracker to interpret every object in the scene as having simultaneously changed course.

SEA.AI’s tracker solves this by knowing the camera’s exact orientation at every moment. Before updating any track, it compensates for camera rotation. Only genuine object motion contributes to track updates.

From Compensation to Active Target Following

This separation of camera motion and object motion also enables active target tracking.

Target tracking lets the operator select any tracked contact and have the system follow it automatically. The PTU adjusts its pan and tilt continuously to keep the selected contact centred in the frame as it moves across the water.

Because the tracker already separates camera movement from object movement, it can steer the camera with precision to stay locked on the target. Even as the vessel manoeuvres. Even as sea conditions change.

Why MOT Matters: From Pixels to Operational Intelligence

The ocean does not pause. Neither does SEA.AI. Every object tracked, every identity maintained, every trajectory followed — so operators and autonomous systems always have the persistent, structured scene model that accurate decisions depend on.

For naval and coast guard applications, continuous track identity enables anomaly detection. The system flags contacts that deviate from declared routes, loiter in restricted areas, or approach at speeds inconsistent with their AIS data.

For autonomous and remotely operated vessels, MOT is a foundational capability. Reliable autonomous navigation requires persistent object identity and trajectory prediction to meet the contact awareness requirements of the International Regulations for Preventing Collisions at Sea (COLREGS). A system that restarts contact tracking from zero on each frame cannot support safe autonomous operations at sea. Emerging IMO guidelines on autonomous shipping are beginning to define standards that presuppose exactly this level of continuous tracking.

SEA.AI’s Watchkeeper and Sentry both integrate MOT as a core component of their perception stack. For a broader look at how AI closes the gap in maritime domain awareness, see AI Maritime Surveillance: From Data to Decisions.

FAQs

What is multi-object tracking (MOT) in maritime AI?

Multi-object tracking assigns a persistent identity to every detected contact and follows each one continuously across camera frames.

In maritime environments, MOT tells operators not just what is present on the water, but what each contact is, how it has been moving, and where it is likely to go next.

SEA.AI’s MOT system handles every object simultaneously, in real time.

Object detection identifies what is present in a single image frame. It has no memory of previous frames and cannot follow movement over time.

Multi-object tracking adds the time dimension: it connects detections across frames, assigns persistent identities, and builds continuous trajectory histories.

Detection tells you what is on the water right now. Tracking tells you who it is and what it has been doing.

A Kalman filter is a mathematical model that estimates where each tracked contact will be in the next frame, based on its current state and motion history.

It also maintains a measure of uncertainty in that estimate. In maritime tracking, the Kalman filter allows the system to predict contact positions reliably even when detections are noisy or temporarily unavailable, giving the association step a strong prior at every frame.

SEA.AI equips some products with a Pan-Tilt Unit (PTU) that allows the camera to pan 360 degrees and tilt continuously.

To prevent camera rotation from corrupting object tracks, the tracker knows the camera’s exact orientation at every frame and compensates for its movement before updating any track.

Only genuine object motion affects tracked contacts. This compensation also enables active target following: the PTU steers automatically to keep a selected contact centred in view as it moves.

Related articles

Maritime Multi-Object Tracking on SEA.AI App

Maritime Multi-Object Tracking: How AI Identifies Every Object on the Water

Detection tells you what is on the water. Multi-Object Tracking tells you what each contact is, where it came from, and where it is going — for every object in the scene, simultaneously. That is the difference between a snapshot and operational intelligence.

Crew member on the controlling station

AI in Maritime Surveillance: From Raw Data to Actionable Intelligence 

Maritime surveillance depends on AIS and radar. When either fails, goes dark, or gets spoofed, the operational picture disappears. AI-powered visual intelligence is how you close that gap.

What is Maritime AI Object Detection

AI object detection identifies and locates maritime objects using deep neural networks, enabling safer vessel navigation and collision avoidance.