CTRL characters break XML formatting in Odata
Steps to Reproduce:
1- Update a Project text or RIch text field with any of the following characters
\2B7F https://symbl.cc/en/2B7F/ ⭿ Vertical Tab Key
\0002 https://symbl.cc/en/0002/ STX Start of Text
\000B https://symbl.cc/en/000B/ VT Vertical Tabulation
These are used in Microsoft Word and other Rich text editors.
2- Copy paste in characters into text field and save (use Modern UX)
3- Run Load Data Warehouse
4- DWH is successful
5- Make an ODATA call to that object, the call will fail because the character is invalid for XML
For example:
hexadecimal value 0x0B is an invalid character
Expected Results: DWH job filter out XML breaking characters
Actual Results: XML breaking characters are loaded into the DWH
Workaround:
Search for bad row after the fact, for example
Using Oracle:
SELECT * FROM DWH_INV_INVESTMENTS
WHERE
DESCRIPTION LIKE '%'||UNISTR('\0002')||'%'
OR DESCRIPTION LIKE '%'||UNISTR('\2B7F')||'%'
OR DESCRIPTION LIKE LIKE '%'||UNISTR('\000B')||'%'
Using Postgres:
select * from dwh_inv_investments
where description ~ '\x2B7F|\x000B|\x0002'
Once found, remove that special character or do an update statement, for example:
UPDATE INV_INVESTMENTS
SET DESCRIPTION = REGEXP_REPLACE(description, '^[:alnum:]', ' ')
WHERE...
Alternatively, if the odata client supports headers use:
header('Content-Type: application/json')
Clarity SaaS GCP and Rego Hosted: 16.1.3, 16.2, 16.2.1
Certain Unicode Control characters are invalid for XML. Odata returns data in XML format by default
This is being investigated at Broadcom under: DE80636