各位用户为了找寻关于Python实现全角半角字符互转的方法的资料费劲了很多周折。这里教程网为您整理了关于Python实现全角半角字符互转的方法的相关资料,仅供查阅,以下为您介绍关于Python实现全角半角字符互转的方法的详细内容
前言
相信对于每一个编程人员来说,在文本处理的时候,经常会遇到全角半角不一致的问题。于是需要程序能够快速的在两者之间互转。由于全角半角本身存在着映射关系,所以处理起来并不复杂。
具体规则为:
全角字符unicode编码从65281~65374 (十六进制 0xFF01 ~ 0xFF5E)
半角字符unicode编码从33~126 (十六进制 0x21~ 0x7E)
空格比较特殊,全角为 12288(0x3000),半角为 32(0x20)
而且除空格外,全角/半角按unicode编码排序在顺序上是对应的(半角 + 65248 = 全角)
所以可以直接通过用+-法来处理非空格数据,对空格单独处理。
用到的一些函数
chr()
函数用一个范围在range(256)内的(就是0~255)整数作参数,返回一个对应的字符。
unichr()
跟它一样,只不过返回的是Unicode字符。
ord()
函数是chr()
函数或unichr()
函数的配对函数,它以一个字符(长度为1的字符串)作为参数,返回对应的ASCII数值,或者Unicode数值。
先来打印下映射关系:
? 1 2for
i
in
xrange
(
33
,
127
):
print
i,
chr
(i),i
+
65248
,
unichr
(i
+
65248
)
返回结果
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 9433
!
65281
!
34
"
65282
"
35
# 65283 #
36
$
65284
$
37
%
65285
%
38
&
65286
&
39
'
65287
'
40
(
65288
(
41
)
65289
)
42
*
65290
*
43
+
65291
+
44
,
65292
,
45
-
65293
-
46
.
65294
.
47
/
65295
/
48
0
65296
0
49
1
65297
1
50
2
65298
2
51
3
65299
3
52
4
65300
4
53
5
65301
5
54
6
65302
6
55
7
65303
7
56
8
65304
8
57
9
65305
9
58
:
65306
:
59
;
65307
;
60
<
65308
<
61
=
65309
=
62
>
65310
>
63
?
65311
?
64
@
65312
@
65
A
65313
A
66
B
65314
B
67
C
65315
C
68
D
65316
D
69
E
65317
E
70
F
65318
F
71
G
65319
G
72
H
65320
H
73
I
65321
I
74
J
65322
J
75
K
65323
K
76
L
65324
L
77
M
65325
M
78
N
65326
N
79
O
65327
O
80
P
65328
P
81
Q
65329
Q
82
R
65330
R
83
S
65331
S
84
T
65332
T
85
U
65333
U
86
V
65334
V
87
W
65335
W
88
X
65336
X
89
Y
65337
Y
90
Z
65338
Z
91
[
65339
[
92
65340
\
93
]
65341
]
94
^
65342
^
95
_
65343
_
96
`
65344
`
97
a
65345
a
98
b
65346
b
99
c
65347
c
100
d
65348
d
101
e
65349
e
102
f
65350
f
103
g
65351
g
104
h
65352
h
105
i
65353
i
106
j
65354
j
107
k
65355
k
108
l
65356
l
109
m
65357
m
110
n
65358
n
111
o
65359
o
112
p
65360
p
113
q
65361
q
114
r
65362
r
115
s
65363
s
116
t
65364
t
117
u
65365
u
118
v
65366
v
119
w
65367
w
120
x
65368
x
121
y
65369
y
122
z
65370
z
123
{
65371
{
124
|
65372
|
125
}
65373
}
126
~
65374
~
把全角转成半角:
? 1 2 3 4 5 6 7 8 9 10 11 12def
full2half(s):
n
=
[]
s
=
s.decode(
'utf-8'
)
for
char
in
s:
num
=
ord
(char)
if
num
=
=
0x3000
:
num
=
32
elif
0xFF01
<
=
num <
=
0xFF5E
:
num
-
=
0xfee0
num
=
unichr
(num)
n.append(num)
return
''.join(n)
把半角转成全角:
? 1 2 3 4 5 6 7 8 9 10 11 12def
half2full(s):
n
=
[]
s
=
s.decode(
'utf-8'
)
for
char
in
s:
num
=
char(char)
if
num
=
=
320
:
num
=
0x3000
elif
0x21
<
=
num <
=
0x7E
:
num
+
=
0xfee0
num
=
unichr
(num)
n.append(num)
return
''.join(n)
上面的实现方式非常的简单,但是现实情况下可能并不会把所以的字符统一进行转换,比如中文文章中我们期望将所有出现的字母和数字全部转化成半角,而常见标点符号统一使用全角,上面的转化就不适合了。
解决方案,是自定义词典。
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71#!/usr/bin/env python
# -*- coding: utf-8 -*-
FH_SPACE
=
FHS
=
((u
" "
, u
" "
),)
FH_NUM
=
FHN
=
(
(u
"0"
, u
"0"
), (u
"1"
, u
"1"
), (u
"2"
, u
"2"
), (u
"3"
, u
"3"
), (u
"4"
, u
"4"
),
(u
"5"
, u
"5"
), (u
"6"
, u
"6"
), (u
"7"
, u
"7"
), (u
"8"
, u
"8"
), (u
"9"
, u
"9"
),
)
FH_ALPHA
=
FHA
=
(
(u
"a"
, u
"a"
), (u
"b"
, u
"b"
), (u
"c"
, u
"c"
), (u
"d"
, u
"d"
), (u
"e"
, u
"e"
),
(u
"f"
, u
"f"
), (u
"g"
, u
"g"
), (u
"h"
, u
"h"
), (u
"i"
, u
"i"
), (u
"j"
, u
"j"
),
(u
"k"
, u
"k"
), (u
"l"
, u
"l"
), (u
"m"
, u
"m"
), (u
"n"
, u
"n"
), (u
"o"
, u
"o"
),
(u
"p"
, u
"p"
), (u
"q"
, u
"q"
), (u
"r"
, u
"r"
), (u
"s"
, u
"s"
), (u
"t"
, u
"t"
),
(u
"u"
, u
"u"
), (u
"v"
, u
"v"
), (u
"w"
, u
"w"
), (u
"x"
, u
"x"
), (u
"y"
, u
"y"
), (u
"z"
, u
"z"
),
(u
"A"
, u
"A"
), (u
"B"
, u
"B"
), (u
"C"
, u
"C"
), (u
"D"
, u
"D"
), (u
"E"
, u
"E"
),
(u
"F"
, u
"F"
), (u
"G"
, u
"G"
), (u
"H"
, u
"H"
), (u
"I"
, u
"I"
), (u
"J"
, u
"J"
),
(u
"K"
, u
"K"
), (u
"L"
, u
"L"
), (u
"M"
, u
"M"
), (u
"N"
, u
"N"
), (u
"O"
, u
"O"
),
(u
"P"
, u
"P"
), (u
"Q"
, u
"Q"
), (u
"R"
, u
"R"
), (u
"S"
, u
"S"
), (u
"T"
, u
"T"
),
(u
"U"
, u
"U"
), (u
"V"
, u
"V"
), (u
"W"
, u
"W"
), (u
"X"
, u
"X"
), (u
"Y"
, u
"Y"
), (u
"Z"
, u
"Z"
),
)
FH_PUNCTUATION
=
FHP
=
(
(u
"."
, u
"."
), (u
","
, u
","
), (u
"!"
, u
"!"
), (u
"?"
, u
"?"
), (u
"”"
, u
'"'
),
(u
"'", u"'"
), (u
"‘"
, u
"`"
), (u
"@"
, u
"@"
), (u
"_"
, u
"_"
), (u
":"
, u
":"
),
(u
";"
, u
";"
), (u
"#"
, u
"#"
), (u
"$"
, u
"$"
), (u
"%"
, u
"%"
), (u
"&"
, u
"&"
),
(u
"("
, u
"("
), (u
")"
, u
")"
), (u
"‐"
, u
"-"
), (u
"="
, u
"="
), (u
"*"
, u
"*"
),
(u
"+"
, u
"+"
), (u
"-"
, u
"-"
), (u
"/"
, u
"/"
), (u
"<"
, u
"<"
), (u
">"
, u
">"
),
(u
"["
, u
"["
), (u
"¥"
, u
""), (u"
]
", u"
]
"), (u"
^
", u"
^
"), (u"
{
", u"
{"),
(u
"|"
, u
"|"
), (u
"}"
, u
"}"
), (u
"~"
, u
"~"
),
)
FH_ASCII
=
HAC
=
lambda
: ((fr, to)
for
m
in
(FH_ALPHA, FH_NUM, FH_PUNCTUATION)
for
fr, to
in
m)
HF_SPACE
=
HFS
=
((u
" "
, u
" "
),)
HF_NUM
=
HFN
=
lambda
: ((h, z)
for
z, h
in
FH_NUM)
HF_ALPHA
=
HFA
=
lambda
: ((h, z)
for
z, h
in
FH_ALPHA)
HF_PUNCTUATION
=
HFP
=
lambda
: ((h, z)
for
z, h
in
FH_PUNCTUATION)
HF_ASCII
=
ZAC
=
lambda
: ((h, z)
for
z, h
in
FH_ASCII())
def
convert(text,
*
maps,
*
*
ops):
""" 全角/半角转换
args:
text: unicode string need to convert
maps: conversion maps
skip: skip out of character. In a tuple or string
return: converted unicode string
"""
if
"skip"
in
ops:
skip
=
ops[
"skip"
]
if
isinstance
(skip,
basestring
):
skip
=
tuple
(skip)
def
replace(text, fr, to):
return
text
if
fr
in
skip
else
text.replace(fr, to)
else
:
def
replace(text, fr, to):
return
text.replace(fr, to)
for
m
in
maps:
if
callable
(m):
m
=
m()
elif
isinstance
(m,
dict
):
m
=
m.items()
for
fr, to
in
m:
text
=
replace(text, fr, to)
return
text
if
__name__
=
=
'__main__'
:
text
=
u
"成田空港—【JR特急成田エクスプレス号・横浜行,2站】—東京—【JR新幹線はやぶさ号・新青森行,6站 】—新青森—【JR特急スーパー白鳥号・函館行,4站 】—函館"
print
convert(text, FH_ASCII, {u
"【"
: u
"["
, u
"】"
: u
"]"
, u
","
: u
","
, u
"."
: u
"。"
, u
"?"
: u
"?"
, u
"!"
: u
"!"
}, spit
=
",。?!“”"
)
特别注意:引号在英语体系中引号是不区分前引号和后引号。
总结
以上就是关于Python实现全角半角字符互转的方法,希望本文的内容对大家的学习或者工作能带来一定的帮助,如果有疑问大家可以留言交流。
原文链接:http://www.biaodianfu.com/python-convert-between-unicode-fullwidth-halfwidth-characters.html