9.4. 字符串函数和操作符

本节描述了用于检查和操作字符串数值的函数和操作符。在这个环境中的字符串包括所有character，character varying和text类型的值。除非另外说明，所有下面列出的函数都可以处理这些类型，不过要小心的是，在使用character类型的时候，需要注意自动填充的潜在影响。通常这里描述的函数也能用于非字符串类型，我们只要先把那些数据转化为字符串表现形式就可以了。有些函数还可以处理位串类型。

SQL定义了一些字符串函数，它们有指定的语法(用特定的关键字而不是逗号来分隔参数)。详情参阅Table 9-5，PostgreSQL也提供了这些函数的版本，它们使用普通的函数调用语法。（参阅Table 9-6）。

Note:在PostgreSQL8.3之前，这些函数将默默接受一些非字符串数据类型的值，由于存在从这些数据类型到text的隐式强制转换，转换后的它们经常发生意外的行为，因此删除了隐式强制转换。然而，字符串连接操作（||）仍接受非字符串输入，只要至少有一个输入一个字符串类型，如显示在Table 9-5。对于其它情况下，如果你需要重复以前的行为，插入一个明确的强制转换到text。

Table 9-5.SQL字符串函数和操作符

函数	返回类型	描述	示例	结果
`string\|\|string`	`text`	字符串连接	`'Post' \|\| 'greSQL'`	`PostgreSQL`
`string\|\|non-string`或`non-string\|\|string`	`text`	一个带有非字符串输入的字符串串联	`'Value: ' \|\| 42`	`Value: 42`
`bit_length(string)`	`int`	字符串的位	`bit_length('jose')`	`32`
`char_length(string)`or`character_length(string)`	`int`	字符串中的字符个数	`char_length('jose')`	`4`
`lower(string)`	`text`	把字符串转化为小写	`lower('TOM')`	`tom`
`octet_length(string)`	`int`	字符串中的字节数	`octet_length('jose')`	`4`
`overlay(stringplacingstringfromint[forint])`	`text`	替换子字符串	`overlay('Txxxxas' placing 'hom' from 2 for 4)`	`Thomas`
`position(substringinstring)`	`int`	指定子字符串的位置	`position('om' in 'Thomas')`	`3`
`substring(string[fromint][forint])`	`text`	截取子字符串	`substring('Thomas' from 2 for 3)`	`hom`
`substring(stringfrompattern)`	`text`	截取匹配POSIX正则表达式的子字符串。参阅Section 9.7获取更多关于模式匹配的信息。	`substring('Thomas' from '...$')`	`mas`
`substring(stringfrompatternforescape)`	`text`	截取匹配SQL正则表达式的子字符串。参阅Section 9.7获取更多关于模式匹配的信息。	`substring('Thomas' from '%#"o_a#"_' for '#')`	`oma`
`trim([leading \| trailing \| both] [characters]fromstring)`	`text`	从字符串`string`的开头/结尾/两边删除只包含`characters`中字符 (缺省是空白)的最长的字符串	`trim(both 'x' from 'xTomxx')`	`Tom`
`upper(string)`	`text`	把字符串转化为大写	`upper('tom')`	`TOM`

还有额外的字符串操作函数可以用，它们在Table 9-6列出。它们有些在内部用于实现Table 9-5列出的SQL标准字符串函数。

Table 9-6. 其它字符串函数

函数	返回类型	描述	示例	结果
`ascii(string)`	`int`	获取参数中第一个字符的ASCII编码值。对于UTF8返回字符的宽字节编码值。对于其它的多字节编码，参数必须是一个ASCII字符。	`ascii('x')`	`120`
`btrim(stringtext[，characterstext])`	`text`	从`string`开头和结尾删除只包含`characters`中字符(缺省是空白)的最长字符串。	`btrim('xyxtrimyyx'， 'xy')`	`trim`
`chr(int)`	`text`	给定编码的字符。对于UTF8这个参数作为宽字节代码处理。对于其它的多字节编码，这个参数必须指定一个ASCII字符，因为text数据类型无法存储NULL数据字节，不能将NULL(0)作为字符参数。	`chr(65)`	`A`
`convert(stringbytea，src_encodingname，dest_encodingname)`	`bytea`	将字符串转化为`dest_encoding`格式的字符串。`src_encoding`表示原始的编码格式。在这种编码格式中`string`必须是有效的。用`CREATE CONVERSION`定义转换。这也有些预定义的转换。参阅Table 9-7显示可用的转换。	`convert('text_in_utf8'， 'UTF8'， 'LATIN1')`	`text_in_utf8`represented in Latin-1 encoding (ISO 8859-1)
`convert_from(stringbytea，src_encodingname)`	`text`	将字符串转化为数据库编码格式的，`src_encoding`表示原始的编码格式，这种编码格式中，`string`必须是有效的。	`convert_from('text_in_utf8'， 'UTF8')`	`text_in_utf8`represented in the current database encoding
`convert_to(stringtext，dest_encodingname)`	`bytea`	将字符串转化为`dest_encoding`编码格式。	`convert_to('some text'， 'UTF8')`	`some text`represented in the UTF8 encoding
`decode(stringtext，typetext)`	`bytea`	把前边用`encode`编码的`string`里面的二进制数据解码。参数类型和`encode`相同。	`decode('MTIzAAE='， 'base64')`	`123\000\001`
`encode(databytea，typetext)`	`text`	把二进制数据编码为只包含ASCII形式的数据。支持的类型有`base64`，`hex`，`escape`。`Escape`输出空字节用`\000`和双反斜杠.	`encode(E'123\\000\\001'， 'base64')`	`MTIzAAE=`
`initcap(string)`	`text`	把每个单词的第一个子母转为大写，其它的保留小写。单词是一系列字母数字组成的字符，用非字母数字分隔。	`initcap('hi THOMAS')`	`Hi Thomas`
`length(string)`	`int`	`string`中字符数	`length('jose')`	`4`
`length(stringbytea，encodingname)`	`int`	指定`encoding`编码格式的`string`的字符数。在这个编码格式中，`string`必须是有效的。	`length('jose'， 'UTF8')`	`4`
`lpad(stringtext，lengthint[，filltext])`	`text`	通过填充字符`fill`(缺省时为空白)，把`string`填充为`length`长度。如果`string`已经比`length`长则将其尾部截断。	`lpad('hi'， 5， 'xy')`	`xyxhi`
`ltrim(stringtext[，characterstext])`	`text`	从字符串`string`的开头删除只包含`characters`中字符(缺省为空白)的最长的字符串。	`ltrim('zzzytrim'， 'xyz')`	`trim`
`md5(string)`	`text`	计算`string`的MD5散列，以十六进制返回结果。	`md5('abc')`	`900150983cd24fb0 d6963f7d28e17f72`
`pg_client_encoding()`	`name`	当前客户端编码名称	`pg_client_encoding()`	`SQL_ASCII`
`quote_ident(stringtext)`	`text`	返回适用于SQL语句的标识符形式(使用适当的引号进行界定)。只有在必要的时候才会添加引号(字符串包含非标识符字符或者会转换大小写的字符)。里面的单引号和反斜杠会处理为双份。也可以参阅Example 39-1。	`quote_ident('Foo bar')`	`"Foo bar"`
`quote_literal(stringtext)`	`text`	返回适用于在SQL语句里当作文本使用的形式。嵌入的引号和反斜杠处理为双份。注意`quote_literal`当空值输入时返回空值;如果参数可能有空值，用这个`quote_nullable`往往更合适。参阅Example 39-1。	`quote_literal('O\'Reilly')`	`'O''Reilly'`
`quote_literal(valueanyelement)`	`text`	将给定的值强制转换为text，加上引号作为文本。里面的单引号和反斜杠会处理为双份。	`quote_literal(42.5)`	`'42.5'`
`quote_nullable(stringtext)`	`text`	返回指定字符串适当的引用，作为一条SQL语句的字符串；或者，如果参数为空的话，返回`NULL`。里面的单引号和反斜杠会处理为双份。参阅Example 39-1。	`quote_nullable(NULL)`	`NULL`
`quote_nullable(valueanyelement)`	`text`	将给定的参数值转化为text，加上引号作为文本；或者，如果参数为空的话，返回`NULL`。里面的单引号和反斜杠会处理为双份。	`quote_nullable(42.5)`	`'42.5'`
`regexp_matches(stringtext，patterntext[，flagstext])`	`setof text[]`	返回`string`中所有匹配POSIX正则表达式的子字符串。参阅Section 9.7.3获得更多模式匹配的信息。	`regexp_matches('foobarbequebaz'， '(bar)(beque)')`	`{bar，beque}`
`regexp_replace(stringtext，patterntext，replacementtext[，flagstext])`	`text`	替换匹配POSIX正则表达式的子字符串。参阅Section 9.7.3以获取更多模式匹配的信息。	`regexp_replace('Thomas'， '.[mN]a.'， 'M')`	`ThM`
`regexp_split_to_array(stringtext，patterntext[，flagstext])`	`text[]`	用POSIX正则表达式作为分隔符，分隔`string`。参阅Section 9.7.3以获取更多模式匹配的信息。	`regexp_split_to_array('hello world'， E'\\s+')`	`{hello，world}`
`regexp_split_to_table(stringtext，patterntext[，flagstext])`	`setof text`	用POSIX正则表达式作为分隔符，分隔`string`。参阅Section 9.7.3以获取更多模式匹配的信息。	`regexp_split_to_table('hello world'， E'\\s+')`	`hello` `world` (2 rows)
`repeat(stringtext，numberint)`	`text`	将`string`重复`number`次	`repeat('Pg'， 4)`	`PgPgPgPg`
`replace(stringtext，fromtext，totext)`	`text`	Replace all occurrences in`string`of substring`from`with substring`to`将字符串`string`里出现地所有子字符串`from`替换成子字符串`to`。	`replace('abcdefabcdef'， 'cd'， 'XX')`	`abXXefabXXef`
`rpad(stringtext，lengthint[，filltext])`	`text`	使用填充字符`fill`(缺省时为空白)，把`string`填充到`length`长度。如果`string`已经比`length`长则将其从尾部截断。	`rpad('hi'， 5， 'xy')`	`hixyx`
`rtrim(stringtext[，characterstext])`	`text`	从字符串`string`的结尾删除只包含`characters`中字符(缺省是一个空白符)的最长的字符串。	`rtrim('trimxxxx'， 'x')`	`trim`
`split_part(stringtext，delimitertext，fieldint)`	`text`	用`delimiter`分隔`string`返回给定的字段(1为基)。	`split_part('abc~@~def~@~ghi'， '~@~'， 2)`	`def`
`strpos(string，substring)`	`int`	指定的子字符串的位置。（同`position(substringinstring)`，不过参数顺序相反。）	`strpos('high'， 'ig')`	`2`
`substr(string，from[，count])`	`text`	截取子字符串。（同`substring(stringfromfromforcount)`)）	`substr('alphabet'， 3， 2)`	`ph`
`to_ascii(stringtext[，encodingtext])`	`text`	将`string`从其它编码转换为ASCII(仅支持`LATIN1`，`LATIN2`，`LATIN9`，`WIN1250`编码)。	`to_ascii('Karel')`	`Karel`
`to_hex(numberintorbigint)`	`text`	将`number`转换成十六进制表现形式	`to_hex(2147483647)`	`7fffffff`
`translate(stringtext，fromtext，totext)`	`text`	将`string`中任何匹配`from`字符集中的字符转化为对应的在`to`字符集的字符。	`translate('12345'， '14'， 'ax')`	`a23x5`

See also the aggregate functionstring_aggin Section 9.18.

Table 9-7. 内置的转换

转换名[a]	源编码	目的编码
`ascii_to_mic`	`SQL_ASCII`	`MULE_INTERNAL`
`ascii_to_utf8`	`SQL_ASCII`	`UTF8`
`big5_to_euc_tw`	`BIG5`	`EUC_TW`
`big5_to_mic`	`BIG5`	`MULE_INTERNAL`
`big5_to_utf8`	`BIG5`	`UTF8`
`euc_cn_to_mic`	`EUC_CN`	`MULE_INTERNAL`
`euc_cn_to_utf8`	`EUC_CN`	`UTF8`
`euc_jp_to_mic`	`EUC_JP`	`MULE_INTERNAL`
`euc_jp_to_sjis`	`EUC_JP`	`SJIS`
`euc_jp_to_utf8`	`EUC_JP`	`UTF8`
`euc_kr_to_mic`	`EUC_KR`	`MULE_INTERNAL`
`euc_kr_to_utf8`	`EUC_KR`	`UTF8`
`euc_tw_to_big5`	`EUC_TW`	`BIG5`
`euc_tw_to_mic`	`EUC_TW`	`MULE_INTERNAL`
`euc_tw_to_utf8`	`EUC_TW`	`UTF8`
`gb18030_to_utf8`	`GB18030`	`UTF8`
`gbk_to_utf8`	`GBK`	`UTF8`
`iso_8859_10_to_utf8`	`LATIN6`	`UTF8`
`iso_8859_13_to_utf8`	`LATIN7`	`UTF8`
`iso_8859_14_to_utf8`	`LATIN8`	`UTF8`
`iso_8859_15_to_utf8`	`LATIN9`	`UTF8`
`iso_8859_16_to_utf8`	`LATIN10`	`UTF8`
`iso_8859_1_to_mic`	`LATIN1`	`MULE_INTERNAL`
`iso_8859_1_to_utf8`	`LATIN1`	`UTF8`
`iso_8859_2_to_mic`	`LATIN2`	`MULE_INTERNAL`
`iso_8859_2_to_utf8`	`LATIN2`	`UTF8`
`iso_8859_2_to_windows_1250`	`LATIN2`	`WIN1250`
`iso_8859_3_to_mic`	`LATIN3`	`MULE_INTERNAL`
`iso_8859_3_to_utf8`	`LATIN3`	`UTF8`
`iso_8859_4_to_mic`	`LATIN4`	`MULE_INTERNAL`
`iso_8859_4_to_utf8`	`LATIN4`	`UTF8`
`iso_8859_5_to_koi8_r`	`ISO_8859_5`	`KOI8R`
`iso_8859_5_to_mic`	`ISO_8859_5`	`MULE_INTERNAL`
`iso_8859_5_to_utf8`	`ISO_8859_5`	`UTF8`
`iso_8859_5_to_windows_1251`	`ISO_8859_5`	`WIN1251`
`iso_8859_5_to_windows_866`	`ISO_8859_5`	`WIN866`
`iso_8859_6_to_utf8`	`ISO_8859_6`	`UTF8`
`iso_8859_7_to_utf8`	`ISO_8859_7`	`UTF8`
`iso_8859_8_to_utf8`	`ISO_8859_8`	`UTF8`
`iso_8859_9_to_utf8`	`LATIN5`	`UTF8`
`johab_to_utf8`	`JOHAB`	`UTF8`
`koi8_r_to_iso_8859_5`	`KOI8R`	`ISO_8859_5`
`koi8_r_to_mic`	`KOI8R`	`MULE_INTERNAL`
`koi8_r_to_utf8`	`KOI8R`	`UTF8`
`koi8_r_to_windows_1251`	`KOI8R`	`WIN1251`
`koi8_r_to_windows_866`	`KOI8R`	`WIN866`
`koi8_u_to_utf8`	`KOI8U`	`UTF8`
`mic_to_ascii`	`MULE_INTERNAL`	`SQL_ASCII`
`mic_to_big5`	`MULE_INTERNAL`	`BIG5`
`mic_to_euc_cn`	`MULE_INTERNAL`	`EUC_CN`
`mic_to_euc_jp`	`MULE_INTERNAL`	`EUC_JP`
`mic_to_euc_kr`	`MULE_INTERNAL`	`EUC_KR`
`mic_to_euc_tw`	`MULE_INTERNAL`	`EUC_TW`
`mic_to_iso_8859_1`	`MULE_INTERNAL`	`LATIN1`
`mic_to_iso_8859_2`	`MULE_INTERNAL`	`LATIN2`
`mic_to_iso_8859_3`	`MULE_INTERNAL`	`LATIN3`
`mic_to_iso_8859_4`	`MULE_INTERNAL`	`LATIN4`
`mic_to_iso_8859_5`	`MULE_INTERNAL`	`ISO_8859_5`
`mic_to_koi8_r`	`MULE_INTERNAL`	`KOI8R`
`mic_to_sjis`	`MULE_INTERNAL`	`SJIS`
`mic_to_windows_1250`	`MULE_INTERNAL`	`WIN1250`
`mic_to_windows_1251`	`MULE_INTERNAL`	`WIN1251`
`mic_to_windows_866`	`MULE_INTERNAL`	`WIN866`
`sjis_to_euc_jp`	`SJIS`	`EUC_JP`
`sjis_to_mic`	`SJIS`	`MULE_INTERNAL`
`sjis_to_utf8`	`SJIS`	`UTF8`
`tcvn_to_utf8`	`WIN1258`	`UTF8`
`uhc_to_utf8`	`UHC`	`UTF8`
`utf8_to_ascii`	`UTF8`	`SQL_ASCII`
`utf8_to_big5`	`UTF8`	`BIG5`
`utf8_to_euc_cn`	`UTF8`	`EUC_CN`
`utf8_to_euc_jp`	`UTF8`	`EUC_JP`
`utf8_to_euc_kr`	`UTF8`	`EUC_KR`
`utf8_to_euc_tw`	`UTF8`	`EUC_TW`
`utf8_to_gb18030`	`UTF8`	`GB18030`
`utf8_to_gbk`	`UTF8`	`GBK`
`utf8_to_iso_8859_1`	`UTF8`	`LATIN1`
`utf8_to_iso_8859_10`	`UTF8`	`LATIN6`
`utf8_to_iso_8859_13`	`UTF8`	`LATIN7`
`utf8_to_iso_8859_14`	`UTF8`	`LATIN8`
`utf8_to_iso_8859_15`	`UTF8`	`LATIN9`
`utf8_to_iso_8859_16`	`UTF8`	`LATIN10`
`utf8_to_iso_8859_2`	`UTF8`	`LATIN2`
`utf8_to_iso_8859_3`	`UTF8`	`LATIN3`
`utf8_to_iso_8859_4`	`UTF8`	`LATIN4`
`utf8_to_iso_8859_5`	`UTF8`	`ISO_8859_5`
`utf8_to_iso_8859_6`	`UTF8`	`ISO_8859_6`
`utf8_to_iso_8859_7`	`UTF8`	`ISO_8859_7`
`utf8_to_iso_8859_8`	`UTF8`	`ISO_8859_8`
`utf8_to_iso_8859_9`	`UTF8`	`LATIN5`
`utf8_to_johab`	`UTF8`	`JOHAB`
`utf8_to_koi8_r`	`UTF8`	`KOI8R`
`utf8_to_koi8_u`	`UTF8`	`KOI8U`
`utf8_to_sjis`	`UTF8`	`SJIS`
`utf8_to_tcvn`	`UTF8`	`WIN1258`
`utf8_to_uhc`	`UTF8`	`UHC`
`utf8_to_windows_1250`	`UTF8`	`WIN1250`
`utf8_to_windows_1251`	`UTF8`	`WIN1251`
`utf8_to_windows_1252`	`UTF8`	`WIN1252`
`utf8_to_windows_1253`	`UTF8`	`WIN1253`
`utf8_to_windows_1254`	`UTF8`	`WIN1254`
`utf8_to_windows_1255`	`UTF8`	`WIN1255`
`utf8_to_windows_1256`	`UTF8`	`WIN1256`
`utf8_to_windows_1257`	`UTF8`	`WIN1257`
`utf8_to_windows_866`	`UTF8`	`WIN866`
`utf8_to_windows_874`	`UTF8`	`WIN874`
`windows_1250_to_iso_8859_2`	`WIN1250`	`LATIN2`
`windows_1250_to_mic`	`WIN1250`	`MULE_INTERNAL`
`windows_1250_to_utf8`	`WIN1250`	`UTF8`
`windows_1251_to_iso_8859_5`	`WIN1251`	`ISO_8859_5`
`windows_1251_to_koi8_r`	`WIN1251`	`KOI8R`
`windows_1251_to_mic`	`WIN1251`	`MULE_INTERNAL`
`windows_1251_to_utf8`	`WIN1251`	`UTF8`
`windows_1251_to_windows_866`	`WIN1251`	`WIN866`
`windows_1252_to_utf8`	`WIN1252`	`UTF8`
`windows_1256_to_utf8`	`WIN1256`	`UTF8`
`windows_866_to_iso_8859_5`	`WIN866`	`ISO_8859_5`
`windows_866_to_koi8_r`	`WIN866`	`KOI8R`
`windows_866_to_mic`	`WIN866`	`MULE_INTERNAL`
`windows_866_to_utf8`	`WIN866`	`UTF8`
`windows_866_to_windows_1251`	`WIN866`	`WIN`
`windows_874_to_utf8`	`WIN874`	`UTF8`
`euc_jis_2004_to_utf8`	`EUC_JIS_2004`	`UTF8`
`ut8_to_euc_jis_2004`	`UTF8`	`EUC_JIS_2004`
`shift_jis_2004_to_utf8`	`SHIFT_JIS_2004`	`UTF8`
`ut8_to_shift_jis_2004`	`UTF8`	`SHIFT_JIS_2004`
`euc_jis_2004_to_shift_jis_2004`	`EUC_JIS_2004`	`SHIFT_JIS_2004`
`shift_jis_2004_to_euc_jis_2004`	`SHIFT_JIS_2004`	`EUC_JIS_2004`
Notes: a. 转换名遵循一个标准的命名模式：将源编码的正式名称中所有非字母数字字符用下划线替换，后面跟着`_to_`，然后再跟着经过同样处理的目标编码的名字。因此这些名字可能和客户的编码名字不同。

Previous article： Next article：