从WordPress到Hexo

终于，在2个多月后，抽出时间将WordPress的入口完全迁移到Hexo静态blog。也就是说，之前访问WordPress的链接都将会跳转到这个静态博客，所以如果你看到这个页面的时候，不要受到惊吓。

Url兼容

在做这个操作之前，还是有很大的顾虑，担心把博客搞坏……因为访问量比较高的几个页面应该都是从搜索引擎进来的，所以至少要对这些页面做一下兼容，跳转到对应的新的页面。

之前WordPress的博客链接是通过http://kyleudo.com/p=278中的p参数定位到文章的。所以思路就是建立一个映射关系，将id映射到新的url。

首先我从统计数据里找到近一个月访问到页面。导出csv之后截取id并合并得到id的数组。
从Hexo源码中遍历每一个文件，搜索id查看是否在id数组中，如果存在，按照Hexo的url规则将url输出。
在VPS上编写route文件，实现映射逻辑。

图片静态

上一步做完之后，旧的博客url就已经映射到新的资源了。下一步要做的是将WordPress下的图片上传到CDN，一方面是为了加速，另一方面是因为将nginx的root改为指向route目录之后，图片资源无法链接到WordPress下。

遍历Hexo源文件，查找所有链接到VPS的图片，使用七牛的python SDK，上传到CDN空间。
正则替换Hexo源文件中的图片链接。

实现 & 总结

上面操作的思路很清晰，有两个值得注意的技术点。Python和正则

Python脚本用到了两个，输出id-url映射表、查找图片url并上传到CDN。

输出id-url映射表

import os 

path = 'Hexo文章源码目录';
needIds = [需要映射的ID];

def mapUrl(filename) :
	file = open(os.path.join(path, filename))
	postId = ''
	datePath = ''
	for line in file:
		if line.find('id: ') == 0:
			postId = line[4:len(line)-1]
			if int(postId) not in needIds:
				return ''
		if line.find('date: ') == 0:
			datePath = line[6:10] + '/' + line[11:13] + '/' + line[14:16]
		if len(datePath) > 0:
			break
	if postId == '':
		return ''
	p = filename[0:len(filename)-3];
	print '$p_map[' + postId + '] = "' + datePath + '/' + p + '";' 
	return datePath + '/' + p

for filename in os.listdir(path) :
	file = os.path.join(path, filename)
	# print filename
	p = mapUrl(filename) + '/'

查找图片url并上传到CDN

import os 
import re
from qiniu import Auth, put_file, etag, urlsafe_base64_encode
import qiniu.config

path = '源文件路径';
imagePath = '图片路径，我提前将WordPress的uploads目录下载了下来'

#需要填写你的 Access Key 和 Secret Key
access_key = 'AK'
secret_key = 'SK'

#构建鉴权对象
q = Auth(access_key, secret_key)

#要上传的空间
bucket_name = '空间'

def upload(file, filename):
	key = filename;
	token = q.upload_token(bucket_name, key, 3600)
	ret, info = put_file(token, key, file)
	assert ret['key'] == key
	print('Uploaded: ' + key)

def scanFile(filename) :
	file = open(os.path.join(path, filename))
	for line in file:
		# print(line)
		pattern = re.compile(r'http:[^\]]*\.(png|jpg)')
		match = pattern.findall(line)
		if len(match) > 0:
			# print(match)
			for m in match:
				if m.find('static.kyleduo') != -1:
					print('skip uploaded')
					continue
				print('=======================================')
				print('Found image: ' + m)
				pathPattern = re.compile(r'(?<=uploads/).*\.(png|jpg)')
				urlPath = pathPattern.search(m)
				if urlPath == None:
					continue
				filename = urlPath.group()
				file = imagePath + urlPath.group() 
				if not os.path.exists(file) :
					print('File not exists! skip...')
					continue
				print('uploading: ')
				upload(file, filename)

for filename in os.listdir(path) :
	# file = os.path.join(path, filename)
	print(filename)
	scanFile(filename)

正则

我很喜欢正则的逻辑性，这次，主要在替换图片URL的时候，使用了正则：

(/www.|/)kyleduo.com/wp-content/uploads(?=[^[]]*.(png|jpg))

使用这个表达式匹配URL并替换成static域名下的URL。