利用百度审核接口检测内容是否还有敏感词

作者:雨祺

来源: 原创

发表于2024-05-19 12:10:47

被阅读0次

敏感词检测工具是一种软件或程序，旨在检测文本或语音内容中的敏感词汇，并给出警告或过滤这些词汇。这些工具通常用于社交媒体平台、在线论坛、网站评论等涉及用户生成内容的场所，以确保社交媒体上的内容不包含令人反感、暴力或不恰当的语言、图片或视频。检测敏感词的工具通常会将文本内容分解为单个单词，并与预定义的敏感词列表进行比较。如果发现匹配，该工具会自动采取行动，例如自动编辑或删除违规内容。这些工具通常是基于机器学习和自然语言处理技术开发的，其准确性和效率取决于数据质量和算法设计。那么小编现在开始给大家讲解运用百度的api文本审核接口检测内容是否含有敏感词。
首先我们看看效果吧。（输出结果为JSON）newstext内容字段参数支持post与get

https://www.meiweny.cn/ecmsapi/index.php?mod=jiexi&act=baidchekconcent

因为newstext参数已经默认了文本内容，访问上面的地址输出结果如下

如果要newstext要自定义文本，那么检测地址就是，get请求为例

https://www.meiweny.cn/ecmsapi/index.php?mod=jiexi&act=baidchekconcent&newstext=%E4%B8%BA%E4%BB%80%E4%B9%88%E7%BD%91%E7%AB%99%E5%81%9A%E4%B8%8D%E8%B5%B7%E6%9D%A5%E5%91%A2

再次声明，newstext参数支持post与get。如果用post提交请按照Ajax异步提交相关代码进行post请求。下面开始讲解重点，如何用百度文本审核接口检测内容是否含有敏感词。代码如下：

<?php

defined("ECMSAPI_MOD") or exit;

$appid = '71789344';

$apiKey = "Mm2fDOGFkc2NjRQCOnRFaFZz";

$secretKey = "E65r4fkhJzOVM6y3xALIlHtUgVhMyw93";

$newstext= $api->param('newstext' ,'你是个傻逼啊，一直访问我做测试做鸡巴。' , 'RepPostStr');

if (class_exists('Memcached')){

$cache = $api->load('cache','mem'); // 第二个参数mem表示mem缓存，redis表示redis缓存，yac表示使用yac方式缓存，默认使用File方式

}else{

$cache = $api->load('cache','file');

}

$cacheName =md5($public_r['add_pcurl']).'pcbaiduneibaidurongcheckhecheng_access_token_' . md5($secretkey);

/** 公共模块获取token开始 */

$response = $cache->get($cacheName);

if(null==$response){

$auth_url = "https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=".$apiKey."&client_secret=".$secretKey;

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $auth_url);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); //信任任何证书

curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0); // 检查证书中是否设置域名,0不验证

curl_setopt($ch, CURLOPT_VERBOSE, DEMO_CURL_VERBOSE);

$res = curl_exec($ch);

if(curl_errno($ch)){

print curl_error($ch);

}

curl_close($ch);

$response = json_decode($res, true);

$cache->set($cacheName,$response, 3600*24*7);

}

$token = $response['access_token'];

function request_post($url = '', $param = ''){

if (empty($url) || empty($param)) {

return false;

}

$postUrl = $url;

$curlPost = $param;

$curl = curl_init();

curl_setopt($curl, CURLOPT_URL, $postUrl);

curl_setopt($curl, CURLOPT_HEADER, 0);

curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);

curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);

curl_setopt($curl, CURLOPT_POST, 1);

curl_setopt($curl, CURLOPT_POSTFIELDS, $curlPost);

$data = curl_exec($curl);

curl_close($curl);

return $data;

}

$url = 'https://aip.baidubce.com/rest/2.0/solution/v1/text_censor/v2/user_defined?access_token=' . $token;

$bodys = array(

    'text' => $newstext

);

$res = request_post($url, $bodys);

$resArray = json_decode($res, true);

if($resArray['conclusionType']=="1"){

$api->load('fun')->json(1,'合规');

}else{

$minganci = []; // 敏感词数组

foreach ($resArray['data'] as $dataItem) {

    $msg = $dataItem['msg'];

    if (isset($dataItem['hits']) && is_array($dataItem['hits'])) {

        foreach ($dataItem['hits'] as $hit) {

            if (isset($hit['words']) && is_array($hit['words']) && count($hit['words']) > 0) {

                $keyword = $hit['words'][0];

                $datasetName = $hit['datasetName'];

                $minganci[] = ['msg' => $msg, 'datasetName' => $datasetName, 'keyword' => $keyword];

            }

        }

    }

}

$keywordsminganci = array_unique(array_map('preg_quote', array_column($minganci, 'keyword')));

$pattern = '/(' . implode('|', $keywordsminganci) . ')/iu'; // 构建正则表达式，移除单词边界限制

$censoredText = preg_replace_callback($pattern, function ($matches) {

    return str_repeat('*', strlen($matches[1]));

}, $newstext);

$result = ['minganci' => $minganci,'newstext' => $newstext,'censoredText' => $censoredText];

$api->load('fun')->json(0,$result,'含有违禁词');

}

以上代码需要安装本站的相关插件才可以运行，API万能插件下载地址：

https://www.meiweny.cn/zazhi/ruanjianleyuan/2.html

整个利用百度审核接口检测内容是否还有敏感词教程讲解结束，逻辑就是这么简单，至于你想通过该接口想怎么检测其他字段值需要把newstext参数该下即可。懂点ajax请求的皮毛知识就能完成。当然还用一种方法就是新建一个敏感词的数据表手动录入敏感词，而后在遍历循环进行检测替换。不过小编还是推荐用百度文本审核接口，全自动且更符合当前ai人工智能环境，并且敏感词更为齐全。

【审核人：站长】

利用百度审核接口检测内容是否还有敏感词

延伸阅读

网友评论

深度阅读

发布者资料

精彩推荐

热门点击

读者好评热门点击

阅读记录

关注美文苑

当前位置

利用百度审核接口检测内容是否还有敏感词

延伸阅读

网友评论

深度阅读

发布者资料

精彩推荐

热门点击

读者好评热门点击

阅读记录

关注美文苑