To identify duplicate payments in Excel using VB script, you can create a macro that checks for similarities in fields like amounts, vendor numbers, invoice numbers, dates, and references.
Below is an example of a basic VB script for this purpose:
VB Script:
Sub IdentifyDuplicatePayments()
Dim ws As Worksheet
Set ws = ThisWorkbook.Sheets("Sheet1") ' Change as needed
Dim lastRow As Long
lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row ' Assuming data starts from row 1
Dim i As Long, j As Long
For i = 2 To lastRow
For j = i + 1 To lastRow
' Check for duplicate criteria: Amount (Column C), Vendor Number (Column B), Invoice Number (Column D), and Date (Column E)
If ws.Cells(i, 3).Value = ws.Cells(j, 3).Value And _
ws.Cells(i, 2).Value = ws.Cells(j, 2).Value And _
InStr(ws.Cells(i, 4).Value, ws.Cells(j, 4).Value) > 0 And _
ws.Cells(i, 5).Value = ws.Cells(j, 5).Value Then
' Highlight duplicate rows
ws.Rows(i).Interior.Color = RGB(255, 0, 0)
ws.Rows(j).Interior.Color = RGB(255, 0, 0)
End If
Next j
Next i
MsgBox "Duplicate Payments Highlighted", vbInformation
End Sub
How it works:
- Columns: Modify the script to match your column structure. In this example:
- Column B: Vendor Number
- Column C: Amount
- Column D: Invoice Number
- Column E: Date
- Logic: The script loops through each row and checks for duplicates in amounts, vendor numbers, similar invoice numbers (partial match using
InStr
), and exact matching dates.
- Highlighting: If duplicates are found, the script highlights both rows in red.
- Adjust Criteria: You can tweak the conditions as needed based on what "similar" means for your use case (e.g., you can modify
InStr
for different matching logic for invoice numbers).
Using Excel's Built-In Features
If you don't want to use VB, you can leverage Excel's conditional formatting and functions like COUNTIFS
, MATCH
, and FILTER
to detect duplicates based on these criteria as well.
How Can We Make the Code More Capable of Spotting Typical Human Errors?
Top of Page
To make the code more capable of spotting typical human errors, such as minor data entry mistakes or formatting inconsistencies, we can enhance the VB script by introducing some fuzzy matching techniques. Below are some changes that can be made:
Enhancements:
- Partial Matching for Invoice Numbers:
- Use a fuzzy string comparison technique to handle common typos.
- Date Tolerances:
- Allow for date comparisons within a certain range (e.g., +/- 1 day).
- Amount Rounding:
- Compare amounts within a small variance (e.g., allowing for rounding errors).
Modified VB Script:
Sub IdentifyDuplicatePaymentsWithErrors()
Dim ws As Worksheet
Set ws = ThisWorkbook.Sheets("Sheet1") ' Change to your sheet name
Dim lastRow As Long
lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row ' Assuming data starts from row 1
Dim i As Long, j As Long
Dim tolerance As Double
tolerance = 0.05 ' Tolerance for rounding errors in amount (5 cents)
For i = 2 To lastRow
For j = i + 1 To lastRow
' Check for duplicate criteria:
' 1. Similar amounts (allowing small rounding errors)
' 2. Exact Vendor Number
' 3. Fuzzy Invoice Number (small differences allowed)
' 4. Date within 1 day tolerance
If Abs(ws.Cells(i, 3).Value - ws.Cells(j, 3).Value) <= tolerance And _ ' Amount
ws.Cells(i, 2).Value = ws.Cells(j, 2).Value And _ ' Vendor Number
FuzzyMatch(ws.Cells(i, 4).Value, ws.Cells(j, 4).Value) >= 0.8 And _ ' Invoice Number (Fuzzy match)
Abs(DateDiff("d", ws.Cells(i, 5).Value, ws.Cells(j, 5).Value)) <= 1 Then ' Date within 1 day tolerance
' Highlight duplicate rows
ws.Rows(i).Interior.Color = RGB(255, 255, 0) ' Yellow for potential duplicates
ws.Rows(j).Interior.Color = RGB(255, 255, 0) ' Yellow for potential duplicates
End If
Next j
Next i
MsgBox "Duplicate Payments with Human Errors Highlighted", vbInformation
End Sub
' Function for fuzzy matching invoice numbers (Levenshtein distance or Jaccard Index)
Function FuzzyMatch(str1 As String, str2 As String) As Double
' This is a placeholder for a simple similarity calculation (or use Levenshtein)
' You can implement more advanced matching algorithms like Jaccard Index
Dim ratio As Double
ratio = Application.WorksheetFunction.Lenv(str1) - Application.WorksheetFunction.Lenv(str2)
' Assuming we return a similarity ratio (e.g., 0.8 for 80% similarity)
FuzzyMatch = 1 - Abs(ratio) / Application.WorksheetFunction.Max(Len(str1), Len(str2))
End Function
Key Modifications:
-
Fuzzy Matching for Invoice Numbers: The FuzzyMatch
function simulates a fuzzy match for invoice numbers to catch small typos. A ratio of 0.8 means an 80% match between strings is considered similar.
-
Amount Comparison: A tolerance of 0.05
allows for small rounding errors when comparing payment amounts, which can occur due to data entry issues or currency formatting.
-
Date Tolerance: The DateDiff
function compares dates within a 1-day range, allowing for possible minor input errors like entering the wrong date.
-
Highlighting Potential Duplicates: Duplicates that meet the error-tolerant criteria are highlighted in yellow, indicating that they are not exact matches but may still be duplicates.
Further Improvements:
- Implement more advanced string matching algorithms like Levenshtein Distance for better fuzzy comparison.
- Adjust tolerance levels based on typical error patterns in your dataset (e.g., larger rounding errors or looser date matching).
Typically Human Errors are the Next Key Along and So One Day is Not Encompassing Enough
Top of Page
To handle typical human errors, such as pressing the next key along on the keyboard, you can enhance the VB script to incorporate common key entry mistakes (like mistyping adjacent keys) and extend the date range tolerance to more than one day. Here's how to modify the code:
Improvements to the Script:
- Typo Correction for Invoice Numbers: Use a function to identify common key-entry mistakes (adjacent keys).
- Extended Date Tolerance: Allow for a date difference of up to 7 days.
Enhanced VB Script:
Sub IdentifyDuplicatePaymentsWithHumanErrors()
Dim ws As Worksheet
Set ws = ThisWorkbook.Sheets("Sheet1") ' Adjust to your worksheet
Dim lastRow As Long
lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row ' Assuming data starts from row 1
Dim i As Long, j As Long
Dim tolerance As Double
tolerance = 0.05 ' Tolerance for rounding errors in amount
For i = 2 To lastRow
For j = i + 1 To lastRow
' Duplicate criteria:
' 1. Similar amounts (allow small rounding errors)
' 2. Same Vendor Number
' 3. Fuzzy match for invoice numbers (allowing typos)
' 4. Dates within 7 days tolerance
If Abs(ws.Cells(i, 3).Value - ws.Cells(j, 3).Value) <= tolerance And _ ' Amount
ws.Cells(i, 2).Value = ws.Cells(j, 2).Value And _ ' Vendor Number
IsSimilar(ws.Cells(i, 4).Value, ws.Cells(j, 4).Value) >= 0.8 And _ ' Invoice Number (fuzzy match)
Abs(DateDiff("d", ws.Cells(i, 5).Value, ws.Cells(j, 5).Value)) <= 7 Then ' Date within 7 days tolerance
' Highlight duplicate rows
ws.Rows(i).Interior.Color = RGB(255, 255, 0) ' Yellow for potential duplicates
ws.Rows(j).Interior.Color = RGB(255, 255, 0) ' Yellow for potential duplicates
End If
Next j
Next i
MsgBox "Duplicate Payments with Human Errors Highlighted", vbInformation
End Sub
' Fuzzy match function to handle adjacent key typos
Function IsSimilar(str1 As String, str2 As String) As Double
Dim ratio As Double
ratio = Application.WorksheetFunction.Lenv(str1) - Application.WorksheetFunction.Lenv(str2)
' Leverage a custom typo handler for common errors
If IsCommonTypo(str1, str2) Then
IsSimilar = 0.9 ' Adjust for typo allowance
Else
IsSimilar = 1 - Abs(ratio) / Application.WorksheetFunction.Max(Len(str1), Len(str2))
End If
End Function
' Function to catch common typing errors (adjacent key typos)
Function IsCommonTypo(str1 As String, str2 As String) As Boolean
Dim i As Integer
Dim typoFound As Boolean
typoFound = False
' Check if the strings are of the same length
If Len(str1) = Len(str2) Then
For i = 1 To Len(str1)
If Mid(str1, i, 1) <> Mid(str2, i, 1) Then
If AdjacentKeys(Mid(str1, i, 1), Mid(str2, i, 1)) Then
typoFound = True
Exit For
End If
End If
Next i
End If
IsCommonTypo = typoFound
End Function
' Define adjacent key logic (partial example, expand as needed)
Function AdjacentKeys(char1 As String, char2 As String) As Boolean
' This function returns True if char1 and char2 are adjacent on a typical keyboard
Select Case char1
Case "q": AdjacentKeys = (char2 = "w")
Case "w": AdjacentKeys = (char2 = "q" Or char2 = "e")
Case "e": AdjacentKeys = (char2 = "w" Or char2 = "r")
' Expand with more keyboard neighbors...
Case Else: AdjacentKeys = False
End Select
End Function
Key Enhancements:
- Handling Typing Errors: The
IsCommonTypo
function checks if the invoice numbers contain adjacent key typos (e.g., "q" instead of "w"). This accounts for typical human errors during data entry.
- Extended Date Tolerance: The script now allows for a date tolerance of 7 days, as human errors can easily involve selecting the wrong day or week.
- Small Variances in Amounts: A tolerance for small rounding errors is still present, allowing amounts to differ slightly due to minor rounding mistakes.
How It Works:
- Partial String Matches: Invoice numbers with slight errors (such as mistyped characters) are considered a match if they have an 80% similarity score or higher.
- Adjacent Key Detection: The
AdjacentKeys
function looks for common typos based on the proximity of keys on a standard keyboard. Expand the list of adjacent keys as needed for better accuracy.
- Date Flexibility: The script checks if the dates are within a 7-day range to account for potential errors in entering dates.
This approach helps catch common human mistakes, like typing errors and small variations in dates and amounts, thus increasing the likelihood of detecting duplicate payments that might not be exact matches but still represent potential duplicates.
Often Invoices are Hurried with Added Characters to Circumvent Duplicate Detection
Top of Page
To address the issue where a spurious character is added to invoice numbers to circumvent duplicate detection, we can enhance the script by implementing fuzzy logic that can ignore or flag additional characters and check for close matches. Here’s how to modify the script:
Enhancements:
- Ignore Non-Alphanumeric Characters: Strip out special characters (e.g., dashes, underscores, or additional characters) before comparing invoice numbers.
- Similarity Threshold: Use a fuzzy match that allows a certain percentage of similarity between invoice numbers, identifying potential duplicates despite extra characters.
Revised VB Script:
Sub IdentifyDuplicatePaymentsWithSpuriousCharacters()
Dim ws As Worksheet
Set ws = ThisWorkbook.Sheets("Sheet1") ' Adjust to your worksheet
Dim lastRow As Long
lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row ' Assuming data starts from row 1
Dim i As Long, j As Long
Dim tolerance As Double
tolerance = 0.05 ' Tolerance for rounding errors in amount
For i = 2 To lastRow
For j = i + 1 To lastRow
' Duplicate criteria:
' 1. Similar amounts (allow small rounding errors)
' 2. Same Vendor Number
' 3. Fuzzy match for invoice numbers (accounting for spurious characters)
' 4. Dates within 7 days tolerance
If Abs(ws.Cells(i, 3).Value - ws.Cells(j, 3).Value) <= tolerance And _ ' Amount
ws.Cells(i, 2).Value = ws.Cells(j, 2).Value And _ ' Vendor Number
FuzzyMatch(StripNonAlphanumeric(ws.Cells(i, 4).Value), StripNonAlphanumeric(ws.Cells(j, 4).Value)) >= 0.85 And _ ' Invoice Number (fuzzy match)
Abs(DateDiff("d", ws.Cells(i, 5).Value, ws.Cells(j, 5).Value)) <= 7 Then ' Date within 7 days tolerance
' Highlight duplicate rows
ws.Rows(i).Interior.Color = RGB(255, 255, 0) ' Yellow for potential duplicates
ws.Rows(j).Interior.Color = RGB(255, 255, 0) ' Yellow for potential duplicates
End If
Next j
Next i
MsgBox "Duplicate Payments with Spurious Characters Highlighted", vbInformation
End Sub
' Function to strip non-alphanumeric characters from invoice numbers
Function StripNonAlphanumeric(str As String) As String
Dim result As String
Dim i As Integer
For i = 1 To Len(str)
If Mid(str, i, 1) Like "[A-Za-z0-9]" Then
result = result & Mid(str, i, 1)
End If
Next i
StripNonAlphanumeric = result
End Function
' Fuzzy match function for invoice numbers (Levenshtein distance, simplified)
Function FuzzyMatch(str1 As String, str2 As String) As Double
Dim ratio As Double
ratio = Application.WorksheetFunction.Lenv(str1) - Application.WorksheetFunction.Lenv(str2)
' Simple similarity calculation (adjust for length differences)
FuzzyMatch = 1 - Abs(ratio) / Application.WorksheetFunction.Max(Len(str1), Len(str2))
End Function
Key Changes:
- Strip Non-Alphanumeric Characters: The
StripNonAlphanumeric
function removes any extra characters added to circumvent duplicate detection. This makes the comparison more reliable.
- Fuzzy Matching: The
FuzzyMatch
function compares the cleaned-up invoice numbers and identifies potential matches with a similarity threshold of 85%. This accounts for extra or spurious characters without requiring exact matches.
- Extended Date Range: The date tolerance remains at 7 days to catch potential errors related to dates.
Benefits:
- This enhanced script can detect duplicates even when extra characters are inserted into the invoice number to bypass system checks, catching common manipulation or human errors.
- By focusing on essential parts of the invoice numbers (ignoring non-alphanumeric characters), you reduce the chances of missing duplicate payments.
Mistakes Occur in Month and Year as Often as Date
Top of Page
To account for human errors in month and year entries, we can apply a fuzzy matching concept to date fields as well. This will help detect mistakes such as incorrect months (e.g., entering “03” instead of “04”) or years (e.g., “2023” instead of “2022”). Below is the modified script that applies the fuzzy match to months and years in addition to invoice numbers:
Enhanced VB Script to Include Fuzzy Match for Date Fields:
Sub IdentifyDuplicatePaymentsWithFuzzyDates()
Dim ws As Worksheet
Set ws = ThisWorkbook.Sheets("Sheet1") ' Adjust to your worksheet
Dim lastRow As Long
lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row ' Assuming data starts from row 1
Dim i As Long, j As Long
Dim tolerance As Double
tolerance = 0.05 ' Tolerance for rounding errors in amount
For i = 2 To lastRow
For j = i + 1 To lastRow
' Duplicate criteria:
' 1. Similar amounts (allow small rounding errors)
' 2. Same Vendor Number
' 3. Fuzzy match for invoice numbers (accounting for spurious characters)
' 4. Fuzzy match for month and year in dates (to catch typical human errors)
If Abs(ws.Cells(i, 3).Value - ws.Cells(j, 3).Value) <= tolerance And _ ' Amount
ws.Cells(i, 2).Value = ws.Cells(j, 2).Value And _ ' Vendor Number
FuzzyMatch(StripNonAlphanumeric(ws.Cells(i, 4).Value), StripNonAlphanumeric(ws.Cells(j, 4).Value)) >= 0.85 And _ ' Invoice Number (fuzzy match)
FuzzyDateMatch(ws.Cells(i, 5).Value, ws.Cells(j, 5).Value) >= 0.8 Then ' Fuzzy match on dates
' Highlight duplicate rows
ws.Rows(i).Interior.Color = RGB(255, 255, 0) ' Yellow for potential duplicates
ws.Rows(j).Interior.Color = RGB(255, 255, 0) ' Yellow for potential duplicates
End If
Next j
Next i
MsgBox "Duplicate Payments with Fuzzy Dates Highlighted", vbInformation
End Sub
' Function to strip non-alphanumeric characters from invoice numbers
Function StripNonAlphanumeric(str As String) As String
Dim result As String
Dim i As Integer
For i = 1 To Len(str)
If Mid(str, i, 1) Like "[A-Za-z0-9]" Then
result = result & Mid(str, i, 1)
End If
Next i
StripNonAlphanumeric = result
End Function
' Fuzzy match function for invoice numbers (Levenshtein distance, simplified)
Function FuzzyMatch(str1 As String, str2 As String) As Double
Dim ratio As Double
ratio = Application.WorksheetFunction.Lenv(str1) - Application.WorksheetFunction.Lenv(str2)
' Simple similarity calculation (adjust for length differences)
FuzzyMatch = 1 - Abs(ratio) / Application.WorksheetFunction.Max(Len(str1), Len(str2))
End Function
' Function for fuzzy matching dates (accounting for month and year errors)
Function FuzzyDateMatch(date1 As Date, date2 As Date) As Double
Dim year1 As Integer, year2 As Integer
Dim month1 As Integer, month2 As Integer
Dim similarity As Double
year1 = Year(date1)
year2 = Year(date2)
month1 = Month(date1)
month2 = Month(date2)
' Consider year and month errors, giving partial credit for close matches
If Abs(year1 - year2) <= 1 And Abs(month1 - month2) <= 1 Then
similarity = 1 - (Abs(year1 - year2) * 0.5 + Abs(month1 - month2) * 0.25)
Else
similarity = 0
End If
FuzzyDateMatch = similarity
End Function
Explanation of Changes:
-
Date Fuzzy Matching:
- The
FuzzyDateMatch
function compares the month and year of two dates and calculates a similarity score. If the month or year differs slightly (e.g., a 1-month or 1-year difference), it gives partial credit, helping catch common human errors like entering "03" instead of "04" or "2022" instead of "2023".
- Weighting system: Mistakes in year are penalized more heavily than mistakes in month (e.g., a 1-year error has a bigger impact than a 1-month difference).
-
General Fuzzy Logic:
- Similar to the previous invoice number matching logic, the function checks for typographical or spurious characters and uses a fuzzy similarity ratio of 85% to account for typical human errors.
Benefits:
- Human Error Detection: This script now not only catches minor typographical errors in invoice numbers but also human errors in month and year inputs, making the duplicate detection process more robust.
- Weighted Similarity: The fuzzy matching system for dates allows for small errors while prioritizing more severe errors (e.g., incorrect years are penalized more than months).
This should help capture a broader range of typical human errors while minimizing false positives, ensuring a more accurate identification of potential duplicate payments
Making the Columns Your Choice
Top of Page
We can modify the code to allow users to choose the columns dynamically by prompting them to input the appropriate column numbers. Below is the modified script that adds this functionality:
Enhanced VB Script with User Input for Column Selection:
Sub IdentifyDuplicatePaymentsWithUserColumns()
Dim ws As Worksheet
Set ws = ThisWorkbook.Sheets("Sheet1") ' Adjust to your worksheet
Dim lastRow As Long
lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row ' Assuming data starts from row 1
Dim amountCol As Integer, vendorCol As Integer, invoiceCol As Integer, dateCol As Integer
Dim i As Long, j As Long
Dim tolerance As Double
tolerance = 0.05 ' Tolerance for rounding errors in amount
' Get column numbers from the user
amountCol = Application.InputBox("Enter the column number for Amount:", Type:=1)
vendorCol = Application.InputBox("Enter the column number for Vendor Number:", Type:=1)
invoiceCol = Application.InputBox("Enter the column number for Invoice Number:", Type:=1)
dateCol = Application.InputBox("Enter the column number for Date:", Type:=1)
For i = 2 To lastRow
For j = i + 1 To lastRow
' Duplicate criteria:
' 1. Similar amounts (allow small rounding errors)
' 2. Same Vendor Number
' 3. Fuzzy match for invoice numbers (accounting for spurious characters)
' 4. Fuzzy match for month and year in dates (to catch typical human errors)
If Abs(ws.Cells(i, amountCol).Value - ws.Cells(j, amountCol).Value) <= tolerance And _ ' Amount
ws.Cells(i, vendorCol).Value = ws.Cells(j, vendorCol).Value And _ ' Vendor Number
FuzzyMatch(StripNonAlphanumeric(ws.Cells(i, invoiceCol).Value), StripNonAlphanumeric(ws.Cells(j, invoiceCol).Value)) >= 0.85 And _ ' Invoice Number (fuzzy match)
FuzzyDateMatch(ws.Cells(i, dateCol).Value, ws.Cells(j, dateCol).Value) >= 0.8 Then ' Fuzzy match on dates
' Highlight duplicate rows
ws.Rows(i).Interior.Color = RGB(255, 255, 0) ' Yellow for potential duplicates
ws.Rows(j).Interior.Color = RGB(255, 255, 0) ' Yellow for potential duplicates
End If
Next j
Next i
MsgBox "Duplicate Payments with Fuzzy Dates Highlighted", vbInformation
End Sub
' Function to strip non-alphanumeric characters from invoice numbers
Function StripNonAlphanumeric(str As String) As String
Dim result As String
Dim i As Integer
For i = 1 To Len(str)
If Mid(str, i, 1) Like "[A-Za-z0-9]" Then
result = result & Mid(str, i, 1)
End If
Next i
StripNonAlphanumeric = result
End Function
' Fuzzy match function for invoice numbers
Function FuzzyMatch(str1 As String, str2 As String) As Double
Dim ratio As Double
ratio = Application.WorksheetFunction.Lenv(str1) - Application.WorksheetFunction.Lenv(str2)
FuzzyMatch = 1 - Abs(ratio) / Application.WorksheetFunction.Max(Len(str1), Len(str2))
End Function
' Function for fuzzy matching dates (accounting for month and year errors)
Function FuzzyDateMatch(date1 As Date, date2 As Date) As Double
Dim year1 As Integer, year2 As Integer
Dim month1 As Integer, month2 As Integer
Dim similarity As Double
year1 = Year(date1)
year2 = Year(date2)
month1 = Month(date1)
month2 = Month(date2)
' Consider year and month errors
If Abs(year1 - year2) <= 1 And Abs(month1 - month2) <= 1 Then
similarity = 1 - (Abs(year1 - year2) * 0.5 + Abs(month1 - month2) * 0.25)
Else
similarity = 0
End If
FuzzyDateMatch = similarity
End Function
How It Works:
-
User Input: The script now prompts the user to enter the column numbers for the amount, vendor number, invoice number, and date fields. This makes the script adaptable to different data layouts.
-
Dynamic Column Assignment: The columns specified by the user are used for duplicate detection.
Benefits:
- Flexibility: The user can now apply this script to datasets with varying column structures without modifying the code.
- Customization: It allows for easy adaptation to different spreadsheets and column setups, improving usability.
This enhancement provides more flexibility while maintaining all of the core fuzzy matching capabilities to detect duplicate payments.
How Do I Use This Script in Excel Desktop and 365?
Top of Page
To use this VB script in Excel Desktop and Excel 365, follow these steps:
1. Open the Visual Basic for Applications (VBA) Editor:
- Press
Alt + F11
in Excel to open the VBA editor.
2. Insert a New Module:
- In the VBA editor, go to
Insert
> Module
. This will open a blank module.
3. Paste the Script:
- Copy the provided VB script and paste it into the module.
4. Close the VBA Editor:
- Once the script is pasted, press
Ctrl + S
to save, then close the VBA editor (Alt + Q
).
5. Run the Macro:
- Back in Excel, press
Alt + F8
to open the Macro list.
- Select the
IdentifyDuplicatePaymentsWithUserColumns
macro and click Run
.
- The script will prompt you to input the column numbers for amount, vendor number, invoice number, and date, and then highlight potential duplicates based on the logic in the script.
Notes:
- Make sure macros are enabled in Excel. If they are disabled, you'll need to enable them by going to
File > Options > Trust Center > Trust Center Settings > Macro Settings
and select "Enable all macros" (or selectively enable them for trusted workbooks).
Summing Up:
Top of Page
These scripts are to give you lines to check in your accounting systems and are far from completely accurate or complete but they do address a lot of simple tests that accountants and auditors are checking on a daily basis and may be helpful if you are performing duplicates detection tests or an accounts payable audit for a client.