How to Clean Up HubSpot: Contacts, Companies, and Deals
HubSpot makes it unusually easy to get data in — forms, imports, an app marketplace full of tools with write access — and that’s exactly why portals get messy. This guide is the HubSpot-specific version of our general CRM cleanup process: the same order of operations, mapped onto HubSpot’s actual tools, billing model, and quirks. “We” here means consultants who run this on client portals, and on our own.
Step 1: Snapshot the portal before touching anything
Export contacts, companies, and deals before any edits. In the UI, go to Settings → Import & Export and run an export of all properties for each object; if you have API access, the CRM exports API does the same thing in a scriptable, repeatable way. Either path gives you a frozen baseline: a rollback reference if a merge or bulk edit goes wrong, and the “before” picture when someone asks why the pipeline number changed.
Two HubSpot-specific notes. First, export all properties, not just the columns in your default view — the properties you didn’t think to include are usually the ones you need later. Second, exports don’t capture everything (activity timelines, for instance, live separately), so treat the snapshot as a property-level baseline, not a full backup. It’s still the difference between a reversible cleanup and a leap of faith.
Step 2: Audit with saved filtered views
HubSpot’s index pages (Contacts, Companies, Deals) support saved views with property filters, and they’re the fastest way to turn audit rules into something you can actually look at. The views we build on every engagement:
- No owner — filter where the owner property is unknown, per object. Unowned records decay fastest because nobody is accountable for them.
- No activity in 30+ days (open deals) — “Last activity date” more than 30 days ago, deal stage is any open stage. Tune the threshold to your sales cycle.
- Close date in the past (open deals) — the single fastest way to find a fictional forecast.
- Missing amount (open deals past qualification) — deals with no amount past your qualification stage are placeholders, not pipeline.
- Owned by deactivated users — filter on owners who have left; HubSpot doesn’t reassign their records for you.
Save each view and share it with the team — a saved view is a living audit rule, re-evaluated every time someone opens it.
For contact-level hygiene, active lists are the better tool, because they update continuously as contacts move in and out of the criteria. Build active lists for: contacts with no email, contacts who have never engaged, contacts created by a specific source with no lifecycle stage. These become both your cleanup queue and your ongoing monitor.
This is a HubSpot-flavored subset of the full 27-check audit — start with these, expand as you go.
Step 3: Duplicates — what the native tool does and doesn’t do
HubSpot ships a duplicate management tool (under data quality tooling, available on Professional and Enterprise tiers) that suggests likely duplicate contacts and companies as pairs for you to review and merge. Use it — it catches the obvious matches with no setup.
But understand its three limits before you declare victory:
- It’s pairwise and suggestion-based. It shows you candidate pairs it believes are duplicates; you review and merge each one. It is not a rule engine you can point at “every contact sharing an email domain plus normalized name.”
- It caps how many it surfaces. The tool shows a bounded list of likely pairs, not an exhaustive sweep of the portal. A heavily duplicated database will have more duplicates than the tool ever displays at once.
- It does not cover deals. Contacts and companies only.
That third limit is the expensive one, because duplicate open deals inflate the pipeline number leadership looks at every week. Checking for them is manual or scripted: export open deals, group by associated company, and flag any company with two or more open deals — plus deals with identical names and amounts created within days of each other, the classic sync artifact. Our deduplication guide covers matching logic and merge order in depth.
Step 4: The billing angle — junk contacts cost real money
On Marketing Hub, you pay based on your marketing contacts count. Every imported list remnant, spam form fill, and dead lead sitting in marketing status is a line on your invoice. This makes HubSpot cleanup unusual among CRMs: contact hygiene has a direct, recurring dollar value.
The lever is marketing contact status. Setting a contact to non-marketing keeps the record and its history but removes it from your billable count (the change takes effect at your next billing update, not instantly). Deleting frees capacity too, but takes the history with it. So the practical triage:
- Real but inactive → set non-marketing. Keep the history, stop paying.
- Pure junk (spam fills, test records, obvious bots) → delete.
- Active and emailable → leave as marketing.
Build an active list for “marketing contacts with no engagement and no open deal association” and review it before each billing renewal. It’s the rare data hygiene task with an invoice attached.
Step 5: Provenance — find the machine making the mess
Every HubSpot record carries read-only record-source properties: hs_object_source (the type of source — import, form, API, integration, user) and hs_object_source_detail (the specific app, import, or form responsible). You can’t edit them, which is exactly what makes them trustworthy.
This is the most underused cleanup tool in HubSpot. Instead of merging duplicates one at a time forever, group your findings by source. Export the flagged records with their source properties and pivot: if 80% of your duplicate contacts trace to one integration, you don’t have a thousand data problems — you have one sync configuration problem. When we audited our own HubSpot portal, an outreach-tool sync had created 10 duplicate open deals, and we found it precisely this way: grouping records by hs_object_source made the pattern obvious in seconds.
Fix the generator before (or at least alongside) merging its output, or the duplicates regenerate on the next sync.
Step 6: Prevention — workflows and validation
HubSpot gives you real prevention tools, with real limits:
- Form validation stops junk at the door: required fields, email validation, blocking free email providers where appropriate.
- Property validation rules can constrain what users enter on a property — formats, allowed values — so “New York” stops appearing four different ways.
- Workflows can enforce hygiene continuously: copy and standardize property values, set owners on assignment rules, flag deals when a close date slips into the past, notify managers when required fields go missing at a stage change. Higher tiers add data-quality automation that suggests and applies formatting fixes.
The limit: workflows react to records HubSpot already has, and they only enforce what you’ve explicitly built. They won’t deduplicate for you, and they can’t reach into a misconfigured integration. Prevention in HubSpot is a complement to the source-level fixes in Step 5, not a substitute.
Step 7: Archive vs. delete
HubSpot’s deletion model has three tiers, and conflating them causes real damage:
- Non-marketing / unenrolled — the record stays, full history intact, off your marketing bill. The default for “real but done.”
- Normal delete — the record goes to a recycle bin and is restorable for a window (90 days for contacts). History and associations are disrupted, but you have an undo path.
- GDPR-compliant delete — permanent erasure of the contact and associated data, no recycle bin, no restore. This exists for legal right-to-erasure requests. Do not use it as a cleanup convenience; we’ve seen portals where someone GDPR-deleted a stale list and discovered the attribution history was unrecoverable.
The general rule from our main cleanup guide applies doubly in HubSpot: archive-equivalents first, delete only pure junk, and GDPR-delete only when the law asks you to.
Running this on a schedule
One pass through these steps gets you clean; the saved views, active lists, and source-grouped audits are what keep you clean. Re-check weekly, watch the trend per rule, and treat any spike from a single hs_object_source_detail as a bug report against that integration.
If you’d rather not run the audit by hand, our open-source fullstackgtm CLI implements these checks as deterministic rules against HubSpot — including the open-deal duplicate detection the native tool doesn’t do — and turns the findings into patch plans you review and approve before anything is applied.
Frequently asked questions
Does HubSpot have built-in duplicate management, and what are its limits?
Yes — the duplicate management tool suggests likely duplicate pairs of contacts and companies, which you review and merge one pair at a time. Its limits: it's suggestion-based rather than rule-based, it only surfaces a capped list of likely pairs rather than every duplicate in the portal, and it doesn't cover deals at all. Open-deal duplicates need a manual or scripted check.
Should I delete or archive contacts in HubSpot?
Archive (or set non-marketing) in most cases. Deleting frees up marketing-contact billing capacity, but it also destroys activity history, attribution, and the email engagement record. The better default: set junk contacts to non-marketing so they stop costing money, and reserve deletion for pure garbage like spam form fills and test records.
How do I find HubSpot contacts with no activity?
Build a filtered view or active list on the 'Last activity date' property — for example, last activity over 180 days ago, or the property is unknown (never any activity). Layer in 'Marketing contact status' to find inactive contacts you're paying for, and 'Create date' to exclude recent additions that simply haven't had time to engage.
What is hs_object_source in HubSpot?
It's a read-only property HubSpot stamps on every record identifying what created it — an import, a form, an API integration, a sync, or a user. Its companion hs_object_source_detail names the specific app or import. Grouping records by these properties tells you which integration is generating your duplicates or junk.
Can I undo a deletion in HubSpot?
Usually, within a window. Normally deleted records go to a recycle bin and can be restored — contacts for up to 90 days. The exception is GDPR-compliant delete, which permanently erases the contact and its associated data with no restore path. Never use GDPR delete as routine cleanup; it exists for legal erasure requests.