技术饭
thinkphp5分页抓取数据并入库
copylian
0 评论
13558 浏览
2018.10.09
PHP可以通过curl抓取其他网站接口数据,抓取到的数据量有时候很多,这个时候如果进行入库的话会造成执行时间过长而停止报错500,解决方案有:1、php配置值max_execution_time修改大一些,还有就是nginx的执行时间也可以修改大一些,但这回造成网页一直卡着也很消耗服务器内存;2、进行分页抓取。
1、普通分页抓取
/**
* @Author [ CopyLian ]
* @Date: [ 2018.09.18 ]
* @Email: [ copylian@aikehou.com ]
* @Site: [ https://www.copylian.com/ ]
* @Description: [ 获取NBA球员本赛季数据 ]
*/
public function getPlayerSeasonData(){
$pagesize = 2; //分页条数
$page = intval(input('page',1)); //当前分页
$page = $page <= 0 ? 1 : $page;
$teamPlayerInfo = DB::name('nbateam_player')->page($page,$pagesize)->column('playerId');
//赛季类型数据:0-季前赛,1-常规赛,2-季后赛
$seasonType = array(0,1,2);
if(!empty($teamPlayerInfo)){
$total = DB::name('nbateam_player')->count();
$pages = ceil($total / $pagesize); //总分页数量
if ($page <= $pages){ //判断是否是最后一页
foreach ($seasonType as $key_1 => $val_1){
foreach ($teamPlayerInfo as $key => $val){
$get_data_url = 'http://ziliaoku.sports.qq.com/cube/index?cubeId=9&dimId=7,8¶ms=t27:2017|t28:'.$val_1.'|t1:'.$val.'&from=sportsdatabase';
$player_data = file_get_contents($get_data_url);
$player_data = json_decode($player_data,true);
if($player_data['code'] != 0){
continue;
}
$player_data = $player_data['data']['nbaPlayerMatch'];
//代码脚本 to do......
}
}
//循环结束跳转到下一页
$page++;
$p = $page - 1;
$this->success('抓取中...第' . $p . '页完成,共' . $pages . '页',url('index/index/index',['page' => $page]));
}
}
$this->success('抓取完成 !','index/show');
}
2、数组分页抓取
/**
* @Author [ CopyLian ]
* @Date: [ 2018.09.27 ]
* @Email: [ copylian@aikehou.com ]
* @Site: [ https://www.copylian.com/ ]
* @Description: [ 获取NBA球队交易签约 ]
*/
public function getNbaTransaction(){
$get_url = 'https://dc.qiumibao.com/data/json_v2/nba_jiaoyi_2017.htm';
$info = file_get_contents($get_url);
$info = json_decode($info,true);
$team_info = $info['data'];
if(empty($team_info)){
return '无数据!';
}
$pagesize = 5; //分页条数
$page = intval(input('page',1)); //当前页
$page = $page <= 0 ? 1 : $page;
$total = count($team_info); //数组总数
$pages = ceil($total / $pagesize); //数据总分页数
$offset = $pagesize * ($page - 1); //数据位移
if ($page <= $pages) { //判断是否已经执行到最后一页
//处理数据
foreach ($team_info as $key => $val) {
//判断数据的key是否是最后一个
if ($key >= $offset&& $key <= ($pagesize * $page) - 1) {
//代码脚本to do......
}
}
$page++;
$p = $page - 1;
$this->success('抓取中...第' . $p . '页完成,共' . $pages . '页',url('index/index',['page' => $page]));
}
$this->success('抓取完成 !','index/show');
}
感谢你的支持,我会继续努力!
扫码打赏,感谢您的支持!
文明上网理性发言!