Beyond On-Time Percentage: What a Real Carrier Scorecard Measures
On-time percentage grades the past. A real carrier scorecard weighs tender acceptance, dwell, claims, POD timeliness, and performance drift, per lane and per facility.
Mithrilis Team
14 min read
Last updated: July 2, 2026.
Ask a room full of brokers what makes a good carrier scorecard for freight brokers and most will point to one number: on-time percentage. It is the number that fits on a slide, the number the customer asks about, the number the TMS reports without any extra work. It is also the number that hides the most. On-time percentage tells you what happened after a load was already delivered. It says nothing about whether the carrier will accept the next tender, how it behaves at your worst-dwelling facility, how often it files a claim, or whether the dependable partner you scored last spring still runs the same trucks and drivers today. A real scorecard measures the signals that predict reliability before the load moves, not the one lagging signal that confirms it afterward.
TL;DR
A carrier scorecard built on on-time percentage grades the past and misses the future. The signals that actually predict whether a carrier performs are tender acceptance rate, communication responsiveness, dwell and detention behavior at specific facilities, claims ratio, POD timeliness, safety and authority standing, and, above all, drift: whether a carrier that scored well last year is still the same operation this quarter. These signals live in separate systems (TMS, tracking, claims, accounting, email, and FMCSA safety records), so almost nobody assembles them into one view. The fix is not a better spreadsheet. It is connected data that makes every signal visible, sourced, and comparable per lane and per facility, so the person booking the load can see the whole picture and still decide who to book.
Key takeaways
- On-time percentage is a lagging indicator: it confirms reliability after delivery and predicts almost nothing about the next load, because it never sees the tenders the carrier rejected.
- FreightWaves has documented that even large truckload rate increases did little to improve carrier reliability, so paying more is not the same as buying dependability.
- Dwell is a first-class reliability signal: ATRI found drivers were detained at 39.3 percent of stops in 2023, rising to 56.2 percent on refrigerated loads and 42.5 percent among spot-market fleets.
- Detention data hides in accounting, not the TMS: ATRI found 94.5 percent of fleets bill detention but get paid on fewer than half of those invoices, so the cost rarely lands next to the carrier's on-time number.
- Carriers drift. The ATA reports large-fleet driver turnover has run as high as 91 percent, churn between carriers rather than out of the industry, so last year's scorecard often describes a roster that no longer exists.
- The carrier pool itself churns: FreightWaves counted a net loss of more than 4,500 carriers across a single recent stretch, which is why a scorecard has to update continuously, not once a year.
On-time percentage is a lagging signal, not a reliability signal#
Start with what on-time percentage can and cannot see. It is measured after the wheels stop, from delivery timestamps the carrier already earned. That makes it a report card on loads that are done, not a forecast of loads you still have to cover. The single most useful thing to know about a carrier on a Tuesday morning is whether it will accept the load you are about to tender. On-time percentage is silent on exactly that question, because it only counts the loads the carrier accepted and delivered. Every tender the carrier quietly rejected is invisible to it.
That is how a carrier posts a beautiful number while being unreliable in practice. A carrier that accepts two of every ten tenders you offer, and delivers those two on time, shows up at nearly 100 percent on-time. The eight loads it walked away from, the ones you scrambled to re-cover at spot, never touch the score. Price does not rescue you here either. FreightWaves has documented that even sizable truckload rate increases did little to improve carrier reliability: when contract rates climbed sharply, tender rejections stayed stubbornly elevated, because a carrier rejects a load for capacity reasons as often as for money. Paying up is not the same as buying dependability, and a scorecard that leans on price and past on-time percentage will keep recommending carriers that fail you at the moment capacity is tight.
On-time percentage has a second flaw: it is an average, and averages lie by blending. A carrier can run at 98 percent on your easy dedicated van lane and 71 percent on your hard reefer lane into the Southeast, and the headline number splits the difference into something that looks fine and describes neither lane. The load you actually care about is the hard one, and the average is designed to hide it.
The signals a real scorecard actually weighs#
A real scorecard is not one number. It is a small set of signals, each of which predicts a different kind of failure, read together. Here is what belongs on it, what each signal predicts, and the uncomfortable part, where the data for it lives today.
| Signal | What it predicts | Where the data lives today |
|---|---|---|
| Tender acceptance rate | Whether the carrier will cover the load you book, not just the ones it likes | TMS tender logs, routing-guide history |
| Communication responsiveness | Exception risk: a carrier that goes quiet is the one that surprises you | Email threads, tracking pings, dispatcher notes |
| Dwell and detention behavior | Facility fit and hidden cost drag at a specific shipper | Tracking timestamps, accounting, detention invoices |
| Claims ratio | Freight integrity and rework cost per hundred loads | Claims system, accounting, POD records |
| POD timeliness | Cash-to-cash speed and billing friction | Document management, TMS, accounting |
| Safety and authority standing | Whether the carrier is legal, insured, and roadworthy today | FMCSA SAFER, insurance certificates |
| Performance drift | Whether last year's reliable carrier is still reliable this quarter | The trend across every system above, over time |
Read that right column again. No two of those signals live in the same place. That is the whole problem, and we come back to it below. First, walk the signals themselves.
Tender acceptance rate is the earliest and most honest predictor of reliability, because the load you could not cover on time was usually a load that was never accepted. A carrier's acceptance rate on your specific lanes tells you far more about next week than its on-time percentage tells you about last week.
Communication responsiveness is the leading edge of an exception. The carrier that stops answering check calls at 2 p.m. is the carrier whose driver is already sitting at a closed dock, and the silence shows up hours before the late-delivery flag does. Responsiveness is a reliability signal precisely because it is early, and an exception caught early is an exception that does not cascade into the next three loads on the same truck.
Dwell and detention behavior is where the scorecard gets specific and expensive. How a carrier behaves at a facility is not one number, it is a facility-by-facility pattern, and the pattern is shockingly common. ATRI found that drivers reported being detained at 39.3 percent of all stops in 2023, a share that climbed to 56.2 percent on refrigerated loads and 42.5 percent among fleets that run the spot market. Detention is not just a carrier problem. It is a facility problem that a good scorecard attributes to the right shipper, so you know which lane looks cheap on the rate sheet and quietly burns hours at the dock. The detention and demurrage you can actually recover starts with measuring it at the facility, not the carrier.
Claims ratio measures freight integrity: how often a carrier damages, shorts, or loses freight per hundred loads, and what the rework costs. A carrier can be perfectly on time and still generate a claim on one load in forty, and that claim is real margin gone plus a customer relationship strained.
POD timeliness looks like paperwork and behaves like cash. A carrier that delivers on time but returns the proof of delivery four days late has stalled your billing, extended your cash-to-cash cycle, and created a collections problem that never shows up in on-time percentage.
Safety and authority standing is the vetting signal, and it is public. FMCSA's SAFER system and safety measurement data expose a carrier's operating authority status, insurance, out-of-service rates, and crash history. A carrier that was fine to book last quarter can have its authority revoked or its insurance lapse this quarter, and a scorecard that never re-checks is scoring a company that may no longer be legal to haul your freight.
Performance drift is the signal that ties the rest together, and it gets its own section, because it is the one almost every scorecard ignores.
One signal is not a scorecard
Any single one of these numbers, read alone, is misleading. High tender acceptance with a bad claims ratio is a carrier that shows up and breaks your freight. Perfect on-time with terrible dwell is a carrier subsidized by the hours your shipper's dock is stealing. The scorecard is the combination, weighted for the lane, not the winner of a single-metric contest.
Why one headline number lies, and a per-lane, per-facility picture does not#
The instinct to compress a carrier into a single grade is understandable and wrong. Reliability is not a property of a carrier. It is a property of a carrier on a lane, into a facility, on a given equipment type, right now. The same carrier can be your most dependable van partner in the Midwest and a coin flip on flatbed into a congested port. A headline score averages those two truths into a third number that is false about both.
This is where the BENCHMARK view earns its place. The question worth answering is not "is this carrier good," it is "how does this carrier compare, on this lane, into this facility, against every other carrier I run there." Answering it requires comparing a carrier against your own connected network rather than a vendor's opinion of it. When you can rank every reefer carrier you use into the Southeast by acceptance rate, dwell, and claims on that specific corridor, the booking decision stops being a gut call and becomes a comparison you can defend. For an asset carrier scoring its own board, the same per-lane view is what separates a lane that pays from one that just runs up deadhead miles between loads.
Carriers drift, so a scorecard is a moving picture#
Here is the assumption buried in most scorecards: that a carrier is a stable thing. It is not. The carrier you booked last year may not be the carrier you booked last week, because the people and equipment behind the name turn over constantly. The ATA has reported that driver turnover at large truckload fleets has run as high as 91 percent, and it is careful to note that this figure measures churn between carriers, not people leaving the industry. Drivers move from fleet to fleet chasing pay and home time. The consequence for your scorecard is direct: the roster that earned last year's on-time number has largely been replaced by the roster hauling your freight today.
The carrier pool itself churns just as hard. FreightWaves counted a net loss of more than 4,500 carriers across a single recent stretch as capacity exited the market, part of a broader truckload capacity exodus that rising rejection rates revealed even while demand was soft. Small carriers lose authority. Owner-operators park equipment. The mix of who is available to haul your lane is different this month than last, so a scorecard frozen at a point in time is describing a market that has already moved.
This is why drift belongs on the scorecard as a signal in its own right, and why WATCH matters more than a snapshot. What you want to catch is the carrier whose acceptance rate on your core lane has slipped four weeks running, or whose dwell at your top facility has crept up since the new dispatcher took over, while the headline on-time number still looks fine because the trend has not finished playing out. Reliability is a trajectory, not a photograph, and the failures worth preventing announce themselves in the slope long before they show up in the average.
A frozen scorecard scores a carrier that no longer exists
An annual carrier review grades a company against a roster, an equipment pool, and an authority status that may all have changed since the last review. In a market shedding thousands of carriers and churning drivers between fleets at high rates, the useful scorecard is the one that updates as the signals change, not the one printed once a year and trusted until the next offsite.
Why nobody assembles the real scorecard#
If these signals are so predictive, why does almost every scorecard still come down to on-time percentage? Because on-time percentage is the only signal that already lives, whole, in one system. Every other signal is scattered. Tender acceptance sits in the TMS. Dwell lives in tracking timestamps. Detention lives in accounting and in invoices the carrier may never get paid on. Claims live in a claims system. PODs live in document management. Responsiveness lives in email threads and dispatcher memory. Safety and authority live at FMCSA. No single screen holds more than a slice, so assembling the real picture means a human stitching six systems together by hand for every carrier, which nobody has time to do, so nobody does it.
The detention numbers make the orphaned-data problem concrete. ATRI found that 94.5 percent of fleets bill for detention, yet get paid on fewer than half of those invoices. The cost of a carrier that dwells is real, it is measured, and it is sitting in an accounting system that never talks to the on-time number in the TMS. The signal exists. It is just stranded one system away from the decision it should inform.
The scorecard problem is a data problem, not a spreadsheet problem
You cannot manually reconcile tender logs, tracking pings, claims records, POD timestamps, and FMCSA authority status for every carrier, every week, without an army. The reason on-time percentage wins by default is not that it is the best signal. It is that it is the only one already assembled. Fix the assembly and the better signals become usable.
This is exactly the gap a connected dataset closes, and it is the whole thesis of the Mithrilis platform: intelligence from connected data, not automation of a single workflow. When your TMS, tracking, accounting, claims, and document systems are unified against one shipment and one carrier record, each signal on the scorecard becomes visible, and just as importantly, sourced. We do not hand you a black-box grade you cannot argue with, and we do not surface a single opaque score in place of the evidence. Every signal traces back to the row it came from, so before you book, you can see the acceptance rate, the dwell trend, and the claim that drove the number, and verify it. That principle, that you should be able to check every result, is written into our manifesto. With the data connected, Atlas answers the plain-English question directly: how did this carrier actually perform on this lane into this facility over the last quarter, and is it getting better or worse. The scorecard stops being a spreadsheet someone updates once a quarter and becomes a live, sourced view the person booking can trust.
Brokers and asset carriers score carriers differently#
The scorecard matters to both sides of the market, and it means something a little different for each.
For a freight broker, the scorecard is cover-risk management. You are grading partner carriers on whether they will accept and perform on your customers' lanes, and the edge is knowing, before you tender, which carrier is trending up on this exact lane and which one has quietly slipped. A benchmarked, per-lane scorecard turns carrier selection from a relationship habit into a defensible decision, and it protects the margin that a bad cover or a surprise claim would erase.
For an asset carrier, the scorecard points two directions. You score the owner-operators and partner or lease carriers you use to flex capacity, on the same acceptance, dwell, and claims signals a broker would. You also live on the other end of a shipper's scorecard, which means the dwell and POD numbers you post are the ones deciding whether you keep the lane. Seeing your own performance the way your customer sees it, per facility and over time, is how you defend your best lanes and walk away from the facilities that quietly burn your drivers' hours. With owned trucks and drivers, the cost of a lane that looks fine on the rate sheet and drifts on service is felt directly, so the multi-signal picture matters more, not less.
See your real carrier scorecard, sourced and per lane#
On-time percentage is not wrong. It is just late, and alone. The carriers that fail you at the worst moment tend to look fine on the one number most scorecards are built from, because the signals that would have warned you, a slipping acceptance rate, a creeping dwell, a lapsing authority, a roster that turned over, were sitting in systems the scorecard never read.
The fastest way to see the difference is to look at your own carriers through connected data. Mithrilis unifies the systems these signals hide in, keeps every number traceable to its source, and updates the picture continuously instead of once a year, so drift shows up as a trend and not a surprise. You still decide who to book. The scorecard just finally shows you the whole carrier instead of the last thing it delivered. Request a demo and we will build the real scorecard on your own lanes and facilities.
Related Mithrilis capabilities
The Mithrilis platform
How connected data turns scattered carrier signals into one sourced, verifiable view.
For freight brokers
Score partner carriers on acceptance, dwell, and claims per lane, not just on-time.
For asset carriers
Score owned and partner capacity, and see your own service the way shippers do.
Stop exceptions before they cascade
Why communication and dwell signals catch failures early, in WATCH mode.
Frequently asked questions
A good scorecard weighs the signals that predict reliability before the load moves, not just the on-time percentage that confirms it afterward. That means tender acceptance rate, communication responsiveness, dwell and detention behavior at specific facilities, claims ratio, POD timeliness, and safety and authority standing, read together and compared per lane and per facility rather than compressed into a single grade. Most importantly it tracks drift over time, because a carrier that scored well last year may not be the same operation this quarter.
On-time percentage is a lagging average. It is measured after delivery, so it only counts loads the carrier accepted and delivered, and it is blind to every tender the carrier rejected. A carrier that accepts two of ten tenders and delivers those two on time can post nearly 100 percent while leaving you to re-cover the other eight. It also blends easy and hard lanes into one number that describes neither, which is why a real scorecard reads it alongside acceptance, dwell, and claims, per lane.
Tender acceptance rate is the share of loads a carrier actually accepts out of those you tender at the agreed rate. It matters because the load you cannot cover on time is usually one that was never accepted, so acceptance predicts next week better than last quarter's on-time number does. And price does not fix it: FreightWaves has documented that even large truckload rate increases did little to improve carrier reliability, because carriers reject loads for capacity reasons as much as for money.
Carriers are not stable. The ATA reports large-fleet driver turnover has run as high as 91 percent, churn between carriers rather than people leaving the industry, so the roster that earned last year's score is largely replaced by the one hauling your freight now. The carrier pool churns too: FreightWaves counted a net loss of more than 4,500 carriers across a single recent stretch. A scorecard frozen at a point in time describes a market that has already moved, which is why drift belongs on it as its own signal.
In pieces, across systems that rarely talk to each other. Tender acceptance sits in the TMS, dwell in tracking timestamps, detention in accounting and invoices, claims in a claims system, PODs in document management, responsiveness in email, and safety and authority at FMCSA. That fragmentation is why most scorecards default to on-time percentage: it is the only signal already assembled in one place. ATRI's finding that 94.5 percent of fleets bill detention but get paid on fewer than half of those invoices shows how a real, measured cost can sit stranded one system away from the decision it should inform.
Yes, in two directions. Asset carriers score the owner-operators and partner or lease carriers they use to flex capacity, on the same acceptance, dwell, and claims signals a broker would use. They also live on the receiving end of their shippers' scorecards, so the dwell and POD numbers they post decide whether they keep a lane. With owned trucks and drivers, a lane that looks fine on the rate sheet but drifts on service costs them directly, so the multi-signal, per-facility picture matters more, not less.
Topics
Keep reading
Customer Concentration Risk: What a Freight Brokerage Loses When Its Biggest Customer Leaves
Read→Revenue share is the wrong way to size customer concentration. Measure the top account's share of true profit, plus the carriers, lanes, and people that leave with it.
The Quotes You Shouldn't Be Winning: Margin-Aware Pricing for Freight Brokers
Read→Win rate and quote speed are vanity metrics. Margin-aware quoting prices each load against what the lane, customer, and carrier actually returned on a true-margin basis, and walks the quotes you would regret winning.
The Operational Ontology: Why the Data Model Is the Real Moat in AI-Native Freight
Read→Freight AI in 2026 sells faster execution on single-system data. The durable advantage is the operational ontology: one verifiable model of customer, carrier, load, and facility across every connected system.