CPF query via JSON API without friction

2026-02-23 01:06 (GMT-3)9 min read

CPF query via JSON API without friction

You have seen this pattern in production: the registration passes, the payment approves and, days later, the risk team discovers that the provided CPF does not exist, is irregular or does not match the name. The loss rarely stays only in the chargeback. It appears in a poorly granted limit, identity fraud, support rework and even compliance exposure when the KYC flow becomes a bet.

The CPF query via JSON API exists precisely to take this kind of decision out of the realm of “looks valid” and into the realm of “is verified”. But “querying a CPF” can mean two quite different things: validating the format and the check digits (mod-11) or confirming existence and registration status against an official base. For critical operations, this difference changes the result.

What the CPF query via JSON API needs to really solve

In product and engineering teams, it is common to start with the basics: checking whether the CPF has 11 digits and passes the check-digit calculation. This is useful to block typos and reduce junk in the funnel. The problem is that fraud and registration inconsistency do not respect check digits. A “mathematically valid” CPF can be nonexistent, suspended, cancelled, null or simply not correspond to the provided holder.

A query via API in JSON, when well implemented, solves three layers at the same time.

The first is standardization: the application sends an identifier and receives a consistent JSON, ready to be consumed by any internal service, without depending on manual screens.

The second is the decision: you start to have an objective signal of registration status and associated data for checking. This reduces false positives in the analysis and avoids releasing an expensive flow (credit, withdrawal, fiscal issuance) for an inconsistent registration.

The third is traceability: logs, request correlation, audit and query evidence in case of dispute or internal review.

JSON is the easy detail. The hard part is what you query

JSON is a format. It does not guarantee data quality. In practice, there are approaches with different levels of risk.

Local validation (mod-11) is cheap and instantaneous, but it does not confirm anything beyond the number’s integrity. It is an initial barrier, not a KYC mechanism.

A query against an official source, on the other hand, confirms existence and registration status. For regulated processes or those with high financial impact, this is the type of signal that matters. Here comes the operational point: up-to-date official data and availability. If the API does not respond, onboarding stalls or you create a bypass, and a bypass becomes fraud.

What “it depends” on in this choice? Volume, criticality of the flow and the cost of the error. In a newsletter or a low-exposure registration, perhaps local validation is enough. In fintech, crypto, betting/iGaming, mobility, marketplace and healthcare, the cost of a poorly validated identity is usually greater than the cost of querying.

How to design the query flow in onboarding

The best architecture is rarely “always query on the first field”. You want to reduce friction without losing control.

A common and efficient design is:

You validate the format and check digit on the front-end to prevent obvious errors. Then, you do the official query on the back-end when the user confirms the data, before releasing the next sensitive step (create an account, activate a wallet, release a limit, allow a withdrawal, issue an invoice).

If your operation has progressive risk steps, you can stagger them. For example: a query at registration to block clearly invalid cases and a second query before the relevant financial event. This helps when there is a change of data over time or when you need to reinforce compliance at a specific moment.

It also pays to define what happens when the API is unavailable. In operations with low tolerance for fraud, “fail closed” (block) is coherent. In operations that prioritize conversion, a controlled “fail open” can exist, but with limits: allow only navigation, do not allow a transaction, and require revalidation within X minutes. The important thing is that the exception is measurable and auditable.

Authentication and security: the token is not optional

In a CPF query via JSON API, access control is part of the product. The most common pattern is API token authentication. For engineering, the care is simple and critical: never expose the token on the front-end. The query must leave your server, with the token stored in a secrets vault and rotated according to policy.

In addition, treat the CPF as sensitive data within your ecosystem. Even when the applicable law does not require encryption at all layers, good security and LGPD practices ask for minimization: store the minimum necessary, retain it for a defined period and restrict access by profile. Logging the CPF “in clear” in observability is a common mistake that becomes an incident.

Latency, timeouts and what changes in your conversion

API integration is only as good as the experience of the person at the end. If the query takes 8 seconds, the user abandons. If the back-end has no timeout, your connection pool saturates.

Here, pragmatism: configure a network and application timeout, implement retry with backoff only on transient errors and adopt idempotency where it makes sense. If you query the same CPF several times in the same flow, use a cache with a short TTL to reduce cost and latency, but without “eternalizing” data that can change.

You also need to decide what your internal API returns to the application. Many companies prefer not to return the detailed status to the end user. Instead, they return neutral, support-oriented messages, keeping status details in the back office. This reduces social engineering and enumeration attempts.

Response handling: success, inconsistency and operational error

A well-designed API delivers more than an “ok”. For product and risk, what matters is having clear states.

On the success path, you want registration status and associated attributes for checking, such as the name and, depending on the case, data that helps to reduce registration divergence.

When there is inconsistency, you want to differentiate a user error (an invalid or divergent CPF) from a compliance restriction (a status that prevents proceeding) and from an operational error (timeout, instability, rate limit). Mixing everything into “failure” becomes a blind funnel.

From an engineering point of view, think in contracts. If the return is JSON, define which fields are mandatory, which are optional and which can be null. And versioning: payload changes without versioning break integrations silently.

Avoid the classic mistake: confusing validation with an official query

Fraudsters know mod-11. They generate “valid” CPFs in bulk. If your onboarding only validates the digit, you reduce typos, but you do not reduce fraud.

The official query adds the signal that mod-11 does not deliver: existence and status at the agency. It is the type of difference that appears in the indicators: a drop in ghost registrations, a reduction of the compliance team’s rework and a better approval rate in later layers, because you filter beforehand.

There is a cost per query, of course. But the cost of the error is usually greater and less predictable. In high-volume operations, predictability is an asset: you trade diffuse losses for a controllable and measurable unit cost.

Where this query fits into KYC, KYB and fiscal issuance

In KYC, the query is a registration-hygiene step and a decision trigger. You can use the status to guide the flow: proceed, request a document, require a selfie, or deny.

In KYB, the same reasoning applies to the CNPJ, with a direct impact on the analysis of partners, sellers and providers. Many B2B2C operations suffer more from inconsistent company registration than from individuals.

In fiscal issuance and the registration of the taker, verification reduces operational error and avoids contingency due to incorrect fiscal data. Here, the gain is less “antifraud” and more efficiency and compliance.

Best practices that avoid pain after the first integration

If you want the CPF query via JSON API to become infrastructure and not a patch, treat it as a central component.

Monitor success rate, p95/p99 latency and error codes. Create alerts for degradation and not just for a total drop. At high volume, slow degradation is what hurts the most.

Document the business rules: which statuses block, which require review and which proceed with controlled risk. Without this, each squad interprets it in its own way and you lose compliance consistency.

And think about package and cost governance. When the model is pay-per-use, controlling consumption per environment (dev, staging, production) and per product avoids surprises. Rate limits and separate keys per environment help a lot.

What to expect from an API ready for scale

For an operation that transacts, three promises matter more than slogans: coverage, updating and response time. Coverage means being able to query every document that passes through your funnel, without gaps. Daily updating (D+0) means making a decision based on the current state. And consistent response time (in practice, something like 0.4 to 2.0 seconds) means the validation fits in onboarding without becoming a bottleneck.

It also matters to have a simple integration. A common pattern is token authentication and an endpoint that returns standardized JSON, with a panel to track consumption, status and history. It is the type of product that the engineering team implements quickly and the risk team can audit.

If you are looking for this as infrastructure, CPF.CNPJ offers an API and panel with query and validation with official and updated data (D+0), designed for KYC/KYB and automation at scale: https://cpfcnpj.com.br

Closing this topic the right way is not about “having an endpoint”. It is about designing a reliable decision within your flow, with predictable latency, failure handling and clear rules for when to proceed and when to stop. When the query becomes a silent and measurable routine, your onboarding becomes faster for those who are legitimate and more expensive for those who try to defraud.

See also