ID Verification with Deep Learning

The cost of back-office operations is one of the biggest hurdles to making financial services accessible to everyone. In particular, the Know Your Customer (KYC) on-boarding and verification process is cumbersome and usually requires manual verification to onboard individuals and entities.

At Synapse, we employ modern artificial intelligence techniques based on computer vision¹ and deep learning to automate physical documentation verification. One of the applications is verifying government-issued identity documents, enabling us to identify and authenticate end-users.

‍

Why this problem?

While many vendors offer identity document verification, partnering with those vendors presents three concerns:

They typically charge a fixed cost per ID verification, which means that we would need to pass on those costs to the platform, who in turn will likely pass on those costs to the end-user. As costs pile on for end-users, only those who can afford to pay these expenses will end up receiving best-in-class financial products, which is contrary to Synapse’s mission.
We wouldn’t be able to directly retrain the model since it is controlled by the vendor, which limits how we can adapt models for any use case and prevents us from dynamically adjusting how we verify users to accommodate our user base.
To be able to add support for new types of documents, we would need to either find additional vendors or wait for the market to build support.

Given these reasons, we decided to build an in-house deep-learning-powered ID verification system.

‍

Under the Hood

If someone went to a bank to open an account, they would provide their ID to the bank teller, who would review the ID in two main ways: the teller would look for indications that the ID was fraudulent or altered, and the teller would verify that the personal data the customer provided matched the data on the ID itself.

Our ID Verification flow mimics the same process and can be broken down into the following components:

Document Detection: Locate the ID in the image
Subfield Detection: Detect and locate important fields
Face Detection: Detect human face on the ID and compute embeddings
Facial Similarity Search: Find duplicate users based on face embeddings
Text Read & Match: Extract text from located fields using OCR and compare to personal information provided
Fraud Detection: Compare extracted information against internal and external fraud databases
Security Features: Validate the authenticity of the document
Verification: Combine insights from previous steps to determine the legitimacy of the document

‍

Image for post — *ID Verification Process Flow*

‍

Document Detection

Since we allow users to submit a photo they’ve taken outside a Synapse-backed interface, each user has a different style of taking the photo, including their own artistic skill, camera quality, lighting, and environment. We designed a system that leverages a range of deep learning and computer vision techniques to “read” IDs in photos captured in less-than-ideal conditions².

Document Detection is achieved through a region-based object detection model, which detects objects as well as obtains their location coordinates in the image. We trained an object detection model to predict the location and orientation of the ID along with classifying the ID type. Using this information, we crop the ID (the area of interest) from the image and transform it into the correct orientation. The cropping and transformation steps ensure that we have standardized, similarly focused, and enhanced quality images with the ID in the center³.

‍

Subfields Detection

The overwhelming majority of the IDs are genuine, but once in awhile, we encounter fraudulent users that want to trick the system with fake IDs. This is why simply extracting text from the document is not enough; we need to ensure that the ID is authentic. To accurately detect the text on a user’s ID and identify fraudulent IDs, it became imperative that we train a model that would learn the patterns within an ID.

‍

For Subfields Detection, we used a pre-trained model which was subsequently trained on hundreds of thousands of documents to learn patterns of valid documents and detect key features within the ID. Our model learns to focus on specific regions of the ID that it deems to be markers of a valid ID. The subfields model learns to detect holograms and logos as well as to recognize if the positions of certain text fields (for example, name and date of birth) associated with the ID type are unusual. To further protect against fraudulent IDs, we also implemented additional security feature techniques to detect forgeries or digital manipulations of the contents within an ID as well as to identify fake documents that closely resemble valid IDs⁴.

‍

Face Detection

Another tool we use to ensure robust protection against fraud is face detection. A common way that fraudsters try to fool the system is by submitting a library card (which usually doesn’t have the cardholder’s photo) or an ID with a cartoon’s face in lieu of a human face. To prevent IDs like these from sneaking through our system, we decided to add human face detection as another verification feature.

‍

Once the model has detected the face, we then encode the face region to a vector of 128 numbers. Each face embedding describes different facial components, including eye color, nose shape, skin tone, among others. The model is trained to optimize the embeddings so that vectors of visually similar faces will be close to each other in the dimensional space. On the other hand, people with different facial features will be far apart. For security purposes, we encrypt each embedding before we store it in the database.

As a part of mitigating the risk of identity theft and fraudulent activities, we built a facial similarity search engine. Our search engine queries and compares detected face vectors against our embedding database to flag users that have established (or attempted to establish) multiple accounts with the same or similar government ID. This will assist Synapse in identifying duplicate users and prevent account opening for those who have committed fraud in our systems already.

‍

Data Extraction

After the provided document has passed the initial validity checks, we use Optical Character Recognition (OCR) to “read” information within the document. OCR converts the image of text into raw text that can be processed and stored in our system. Instead of extracting text from the entire document, we only extract from specific fields identified in the Subfield Detection process⁵.

Using OCR and various text processing techniques, we are able to extract personal details and ID information (ID number, issuance and expiration dates, etc.), thereby eliminating the need for manual entry. Subsequently, we can use the data to verify whether the user’s information matches as well as to identify high-risk users by comparing the data against internal fraud predictors and external databases as part of our KYC verification processes.

‍

Manual Intervention

Combining the insights from the previous steps, we then determine if the provided document is a valid, authentic ID. Cases that are considered high risk or are found to share similar characteristics with previously-seen fraudulent users are marked for manual review. Similarly, cases that cannot be categorized with an appropriate degree of confidence are also reviewed manually. In the case of manual intervention, the operations team validates and classifies them. Our manual review process not only examines users that have indicators of fraud but also ensures that our system is not discriminating against any segment of users.

‍

Semi-Automated Retraining

Labeling large amounts of images for object detection is a mundane and time-consuming task. In an effort to alleviate the workload for our in-house data labeling team, we decided to implement a human-in-the-loop system using pseudo-labeling. Instead of labeling each document from scratch, the annotators only need to fine-tune the labels or predictions made by the current model. This makes the annotation process faster and less prone to mistakes and has the added benefit of acting as a feedback loop for us to learn more about our model bias and limitations. The model learns from manual classification, and over time can spot patterns and closely mirror the manual results. This is accomplished by semi-automated retraining of the model including the newer data and manually curated data. This process allows us to adapt our models based on learnings taken from real production data, creating a continuous improvement flow.

‍

What’s Next

Leveraging an existing computer vision pipeline, we are able to build new physical document products at a much faster pace. Other products include:

Remote Deposit Capture (RDC): digital image of checks, allowing users to deposit a check into a bank account from a remote location.
Employer Identification Number (EIN): identify a business entity with its identification number and verify the authenticity of supporting documentation.
Proof of Address: verify user residency and the authenticity of submitted documents.

We can expand the variety of documents we can accept for user on-boarding and verification. For example, for international students in the U.S, we can quickly train a model to verify Form DS-2019 or Form I-20, providing another option for identity verification for a market segment that traditionally has had trouble opening a bank account.

In addition to internationalizing the models for alternate forms of identity verification and enhanced due diligence as described above, we also plan to train models on specific national IDs in alignment with our expansion into a given country. We already have a pipeline that specializes in validating passports — more on this to come, so stay tuned!

‍

Team Members: Rakesh Ravi, Adhish Thite

Thanks to Yona Koch-Fienberg for collaborating on the project and the blog.

‍

–––

Footnotes

[1] Computer vision is the field that deals with empowering computers with the ability to “see” the way humans do. It has been an active area of research for decades, but only recently has the field achieved escape velocity for business applications. The widespread adoption of Convolutional Neural Networks (CNNs) has accelerated developments in the field of computer vision.

[2] This is a complex problem in itself since we accept U.S. state-issued and federal-issued identification, as well as passports issued by other countries. State-issued IDs can include driver’s licenses and identification cards. Federal-issued identifications include passports, passport cards, employment authorization cards (EADs), and permanent resident cards (“green cards”), just to name a few. For each type of ID, there are different designs depending on the issuing entity (state or country), the year of issuance, and in some places, the ID holder themselves (age, diplomatic status, etc.). Occasionally, users may attempt to fool the system by providing non-government-issued IDs, such as a library card, a credit card, or simply a piece of paper with user information written on it.

[3] This is a crucial step towards reducing the bias that Convolutional Neural Networks have against lower quality images and allows us to improve model performance by not having objects of different scales. It is important to ensure that we are building models that bring equity to the onboarding process and do not disproportionately impact lower SES individuals.

[4] In an effort to speed up training and eliminate the need for the enormous volume of training data that is generally required for deep learning models, we chose to utilize transfer learning. Transfer learning allows us to use the knowledge gained from other tasks in order to tackle new but similar problems quickly and effectively. For instance, it would be faster and easier for a child to learn how to draw a puppy if she has already learned how to draw a cat. Drawing a cat requires the child to learn basic shapes, which are the building blocks to draw a puppy, too. Recent advances in transfer learning have precluded the need for the problems involved to be closely related and thus helped its extension to a broader range of applications.

[5] Text in the background can interfere with the text matching process; it would not be able to distinguish between text on the user’s ID and text on the newspaper in the background. As a result, region-specific text extraction and text matching provide additional security layers to the document validation.

[6] ID Score is quite helpful in establishing the source of truth since it verifies the user’s name, email, address, date of birth, etc., independent of what is shown on the ID document of the user.

‍