Java和Python使用有道词典制作查单词脚本

2012 年 11 月 13 日

div id=”content” contentScore=”5202″>先上两张图看一下效果
Java的：

Java执行效果

Python的：

Python执行效果

今天突发奇想，想做个查单词的东西，就赶紧去有道词典官网看了一下，原来我们要查询的单词是嵌入在网页地址中送给有道词典的，然后页面的结果就是我们需要的单词释义，所以这个东西需要的技术知识只有：

正则表达式

我们要做的只是从获取到的网页源码中提取处单词释义，所以这里只说提取单词释义的正则表达式。
分析网页源码，我们可以看到，单词释义都在一个div标签内,如图：

首要目标是获取这一部分，正则表达式可以这样写：

(?s)

.*?

//(?s)的含义是使’.’可以匹配换行符，默认是不匹配的
//.*?意思是在非贪婪模式下，匹配任意多个字符获取到这一部分后，进一步的，我们需要的是里面的单词释义，所以，我们可以这样：

(?m)

(.?)

//(?m)的含义是按行匹配，在没一行都按照这个正则表达式匹配，默认情况是不分行，统一匹配的
//这里用小括号把.?包起来，为的是可以直接获取单词的含义，舍去旁边的标签下面是具体的代码：

一，Java代码
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;

import java.io.IOException;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {
public static void main(String[] args) throws IOException {
CloseableHttpClient httpClient = HttpClients.createDefault();

System.out.print(“请输入你要查的单词:”);
Scanner s = new Scanner(System.in);
String word = s.nextLine();
word = word.replaceAll(” “,”+”);

//根据查找单词构造查找地址
HttpGet getWordMean = new HttpGet(“http://dict.youdao.com/search?q=” + word + “&keyfrom=dict.index”);
CloseableHttpResponse response = httpClient.execute(getWordMean);//取得返回的网页源码

String result = EntityUtils.toString(response.getEntity());
response.close();
//注意(?s)，意思是让’.’匹配换行符，默认情况下不匹配
Pattern searchMeanPattern = Pattern.compile(“(?s)

.*?

“);
Matcher m1 = searchMeanPattern.matcher(result); //m1是获取包含翻译的整个

的

if (m1.find()) {
String means = m1.group();//所有解释，包含网页标签
Pattern getChinese = Pattern.compile(“(?m)

(.*?)

“); //(?m)代表按行匹配
Matcher m2 = getChinese.matcher(means);

System.out.println(“释义:”);
while (m2.find()) {
//在Java中(.*?)是第1组，所以用group(1)
System.out.println(“t” + m2.group(1));
}
} else {
System.out.println(“未查找到释义.”);
System.exit(0);
}
}
}二，Python代码
#!/usr/bin/python
#coding:utf-8
import urllib
import sys
import re

if len(sys.argv) == 1: #没有单词就提示用法
print “用法:./Dict.py 要查找的单词”
sys.exit()

word = “”
for x in range(len(sys.argv) – 1): #查找的可能是短语，中间有空格，如”join in”,这里拼接单词
word += ” ” + sys.argv[x + 1]
print “单词：” + word

searchUrl = “http://dict.youdao.com/search?q=” + word + “&keyfrom=dict.index” #查找的地址
response = urllib.urlopen(searchUrl).read() #获得查找到的网页源码

#从网页源码提取出单词释义那一部分
searchSuccess = re.search(r”(?s)

.*?

“,response)

if searchSuccess:
#获取我们想提取的核心单词释义，在只有一个分组的情况下，findall返回这个子组字符串组成的列表
means = re.findall(r”(?m)

(.*?)

“,searchSuccess.group())
print “释义：”
for mean in means:
print “t” + mean #输出释义
else:

————————————–分割线 ————————————–

无需操作系统直接运行 Python 代码 http://www.linuxidc.com/Linux/2015-05/117357.htm

CentOS上源码安装Python3.4 http://www.linuxidc.com/Linux/2015-01/111870.htm

《Python核心编程第二版》.(Wesley J. Chun ).[高清PDF中文版] http://www.linuxidc.com/Linux/2013-06/85425.htm

《Python开发技术详解》.( 周伟,宗杰).[高清PDF扫描版+随书视频+代码] http://www.linuxidc.com/Linux/2013-11/92693.htm

Python脚本获取Linux系统信息 http://www.linuxidc.com/Linux/2013-08/88531.htm

在Ubuntu下用Python搭建桌面算法交易研究环境 http://www.linuxidc.com/Linux/2013-11/92534.htm

Python 语言的发展简史 http://www.linuxidc.com/Linux/2014-09/107206.htm

————————————–分割线 ————————————–

About The Author

bjmayor

程序员，码农，php,python,ios,android,go，产品经理，创业。

2025年八月
M	T	W	T	F	S	S
« Jan
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Related Posts

About The Author

bjmayor