Sanitize copy/paste text from word
May 27, 2010 PHP Coding, Work
In a recent project I have had to deal with text copied from a Microsoft Word document and pasted into a textarea. Word automatically changes a few certain characters to what it thinks it should be, such as the ellipsis and quotes. When dealing with inserting that text into a database I was getting errors. To solve my problems I created a sanitize function to replace these certain characters with acceptable characters.
// Used to sanitize Microsoft Word's Special Characters // Good reference http://www.lookuptables.com function SanitizeFromWord($Text = '') { $chars = array( 130=>',', // baseline single quote 131=>'NLG', // florin 132=>'"', // baseline double quote 133=>'...', // ellipsis 134=>'**', // dagger (a second footnote) 135=>'***', // double dagger (a third footnote) 136=>'^', // circumflex accent 137=>'o/oo', // permile 138=>'Sh', // S Hacek 139=>'<', // left single guillemet 140=>'OE', // OE ligature 145=>'\'', // left single quote 146=>'\'', // right single quote 147=>'"', // left double quote 148=>'"', // right double quote 149=>'-', // bullet 150=>'-', // endash 151=>'--', // emdash 152=>'~', // tilde accent 153=>'(TM)', // trademark ligature 154=>'sh', // s Hacek 155=>'>', // right single guillemet 156=>'oe', // oe ligature 159=>'Y', // Y Dieresis 169=>'(C)', // Copyright 174=>'(R)' // Registered Trademark ); foreach ($chars as $chr=>$replace) { $Text = str_replace(chr($chr), $replace, $Text); } return $Text; }
Enjoy!
Leave a Reply