Skip to Main Content

Accented Characters

I encountered some strange behaviour when trying to remove all characters from a varchar2, except a-z, A-Z and 0-9.
Using regular expressions, you can define that in a number of ways, such as [^A-Za-z0-9] or [^[:alnum:]]

Example 1 works as expected (see the query below):
For A1 the ë (ascii 235) is not included, since it falls outside of the [^A-Za-z0-9] range.
For A2 the ë is included, since it is first converted to a regular e character.

Example 2 is (a dummy version of) the exact same query on another (also 11gR2) database and client. Here, the result for A1 was a bit different:
For A1 the ë was included.
A2 was the same.

Probably the characterset settings of the databases and/or clients were the cause? I have included the info gathered by PL/SQL Developer for both setups.
My favourite online regular expressions tool regex101.com agrees that example 1 is as expected.

Accented Characters Example 1

A1A2
Ab3X1Ab3eX1
Character size: 4 byte(s)
CharSetID: 873
NCharSetID: 2000
Unicode Support: True
NLS_LANG: AMERICAN_AMERICA.WE8MSWIN1252
NLS_CHARACTERSET: AL32UTF8
NLS_NCHAR_CHARACTERSET: AL16UTF16

Accented Characters Example 2

A1A2
Ab3ëX1Ab3eX1
Character size: 1 byte(s)
CharSetID: 31
NCharSetID: 2000
Unicode Support: True
NLS_LANG: DUTCH_THE NETHERLANDS.WE8MSWIN1252
NLS_NCHAR_CHARACTERSET: AL16UTF16
NLS_CHARACTERSET: WE8ISO8859P1

Value Pairs

A simple example how to split string like 'ABC:123;XYZ:789' into multiple value pairs.

Value Pairs Example

RValue 1Value 2
ABC:123ABC123
ABCD:1234ABCD1234
WXYZ:6789WXYZ6789
XYZ:789XYZ789

Phone Numbers

Creating a simple check for phone numbers was not as easy as I expected.
Getting it to work in regex101.com was easy, but that expression was not handled as expected when using it in the Oracle database.
I did get it to work by changing the expression, but I'm still not sure why the first version didn't work.

Phone Numbers Example

TelR_1R_2
088 223 22 3301
+31 (0)88 - 223 22 33 01
088-223 22 3301
088223223301
abc 12300

Code

Region

Identification
Sequence
Title
Type
Source
Source Type
Region Source

Region

Identification
Sequence
Title
Type
Source
Source Type
Region Source

Region

Identification
Sequence
Title
Type
Source
Source Type
Region Source

Region

Identification
Sequence
Title
Type
Source
Source Type
Region Source