return only Digits 0-9 from a String
C#vb.netRegexVbscriptCode GenerationC# Problem Overview
I need a regular expression that I can use in VBScript and .NET that will return only the numbers that are found in a string.
For Example any of the following "strings" should return only 1231231234
- 123 123 1234
- (123) 123-1234
- 123-123-1234
- (123)123-1234
- 123.123.1234
- 123 123 1234
- 1 2 3 1 2 3 1 2 3 4
This will be used in an email parser to find telephone numbers that customers may provide in the email and do a database search.
I may have missed a similar regex but I did search on regexlib.com.
[EDIT] - Added code generated by RegexBuddy after setting up musicfreak's answer
VBScript Code
Dim myRegExp, ResultString
Set myRegExp = New RegExp
myRegExp.Global = True
myRegExp.Pattern = "[^\d]"
ResultString = myRegExp.Replace(SubjectString, "")
VB.NET
Dim ResultString As String
Try
Dim RegexObj As New Regex("[^\d]")
ResultString = RegexObj.Replace(SubjectString, "")
Catch ex As ArgumentException
'Syntax error in the regular expression
End Try
C#
string resultString = null;
try {
Regex regexObj = new Regex(@"[^\d]");
resultString = regexObj.Replace(subjectString, "");
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
C# Solutions
Solution 1 - C#
In .NET, you could extract just the digits from the string. Like this:
string justNumbers = new String(text.Where(Char.IsDigit).ToArray());
Solution 2 - C#
As an alternative to the main .Net
solution, adapted from a similar question's answer:
string justNumbers = string.Concat(text.Where(char.IsDigit));
Solution 3 - C#
I don't know if VBScript has some kind of a "regular expression replace" function, but if it does, then you could do something like this pseudocode:
reg_replace(/\D+/g, '', your_string)
I don't know VBScript so I can't give you the exact code but this would remove anything that is not a number.
EDIT: Make sure to have the global flag (the "g" at the end of the regexp), otherwise it will only match the first non-number in your string.
Solution 4 - C#
Note: you've only solved half the problem here.
For US phone numbers entered "in the wild", you may have:
- Phone numbers with or without the "1" prefix
- Phone numbers with or without the area code
- Phone numbers with extension numbers (if you blindly remove all non-digits, you'll miss the "x" or "Ext." or whatever also on the line).
- Possibly, numbers encoded with mnemonic letters (800-BUY-THIS or whatever)
You'll need to add some smarts to your code to conform the resulting list of digits to a single standard that you actually search against in your database.
Some simple things you could do to fix this:
-
Before the RegEx removal of non-digits, see if there's an "x" in the string. If there is, chop everything off after it (will handle most versions of writing an extension number).
-
For any number with 10+ digits beginning with a "1", chop off the 1. It's not part of the area code, US area codes start in the 2xx range.
-
For any number still exceeding 10 digits, assume the remainder is an extension of some sort, and chop it off.
-
Do your database search using an "ends-with" pattern search (SELECT * FROM mytable WHERE phonenumber LIKE 'blah%'). This will handle sitations (although with the possibility of error) where the area code is not provided, but your database has the number with the area code.
Solution 5 - C#
By the looks of things, your trying to catch any 10 digit phone number....
Why not do a string replace first of all on the text to remove any of the following characters.
<SPACE> , . ( ) - [ ]
Then afterwards, you can just do a regex search for a 10 digit number.
\d{10}
Solution 6 - C#
Have you gone through the phone nr category on regexlib. Seems like quite a few do what you need.
Solution 7 - C#
In respect to the points made by richardtallent, this code will handle most of your issues in respect to extension numbers, and the US country code (+1) being prepended.
Not the most elegant solution, but I had to quickly solve the problem so I could move on with what I'm doing.
I hope it helps someone.
Public Shared Function JustNumbers(inputString As String) As String
Dim outString As String = ""
Dim nEnds As Integer = -1
' Cycle through and test the ASCII character code of each character in the string. Remove everything non-numeric except "x" (in the event an extension is in the string as follows):
' 331-123-3451 extension 405 becomes 3311233451x405
' 226-123-4567 ext 405 becomes 2261234567x405
' 226-123-4567 x 405 becomes 2261234567x405
For l = 1 To inputString.Length
Dim tmp As String = Mid(inputString, l, 1)
If (Asc(tmp) >= 48 And Asc(tmp) <= 57) Then
outString &= tmp
ElseIf Asc(tmp.ToLower) = 120
outString &= tmp
nEnds = l
End If
Next
' Remove the leading US country code 1 after doing some validation
If outString.Length > 0 Then
If Strings.Left(outString, 1) = "1" Then
' If the nEnds flag is still -1, that means no extension was added above, set it to the full length of the string
' otherwise, an extension number was detected, and that should be the nEnds (number ends) position.
If nEnds = -1 Then nEnds = outString.Length
' We hit a 10+ digit phone number, this means an area code is prefixed;
' Remove the trailing 1 in case someone put in the US country code
' This is technically safe, since there are no US area codes that start with a 1. The start digits are 2-9
If nEnds > 10 Then
outString = Right(outString, outString.Length - 1)
End If
End If
End If
Debug.Print(inputString + " : became : " + outString)
Return outString
End Function
Solution 8 - C#
The simplest solution, without a regular expression:
public string DigitsOnly(string s)
{
string res = "";
for (int i = 0; i < s.Length; i++)
{
if (Char.IsDigit(s[i]))
res += s[i];
}
return res;
}