AI创造营 :Metaverse启动机之重构现世,结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人

Overview

paddle-wechaty-Zodiac

AI创造营 :Metaverse启动机之重构现世,结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人

12星座若穿越科幻剧,会拥有什么超能力呢?快来迎接你的专属超能力吧!

  • 现在很多年轻人都喜欢看科幻剧,像是复仇者系列,里面有很多英雄、超能力者,这些人都是我们的青春与情怀,那么12星座若穿越科幻剧,会分别拥有什么超能力呢?
  • 利用paddle提供的文本匹配和对话闲聊模型结合wechaty进行构建
  • 除了获得专属超能力外,还可以查看今日运势,外加寂寞无聊时找个“算命大师”聊聊天

效果展示

本项目的实现过程

云服务器部分

  • 参考https://aistudio.baidu.com/aistudio/projectdetail/2279551

  • 我用的阿里云的云服务器,也可以考虑其他云服务或者是外网可访问的服务器资源。

  • 进入服务器终端,在终端输入以下命令(注:确保输入的端口是对外开放的,WECHATY_TOKEN请填写自己的token)

$ apt update

$ apt install docker.io

$ docker pull wechaty/wechaty:latest

$ export WECHATY_LOG="verbose"

$ export WECHATY_PUPPET="wechaty-puppet-wechat"

$ export WECHATY_PUPPET_SERVER_PORT="8080"

$ export WECHATY_TOKEN="puppet_padlocal_xxxxxx" # 这里输入你自己的token

$ docker run -ti --name wechaty_puppet_service_token_gateway --rm -e WECHATY_LOG -e WECHATY_PUPPET -e WECHATY_TOKEN -e WECHATY_PUPPET_SERVER_PORT -p "$WECHATY_PUPPET_SERVER_PORT:$WECHATY_PUPPET_SERVER_PORT" wechaty/wechaty:latest
  • 输入网址: https://api.chatie.io/v0/hosties/xxxxxx (后面的xxxxxx是自己的token),如果返回了服务器的ip地址以及端口号,就说明运行成功了

  • 运行后会输出一大堆东西,找到一个Online QR Code: 的地址点击进去,会出现二维码,微信扫码登录,最终手机上显示桌面微信已登录,即可。

环境安装

!pip install -U paddlepaddle -i https://mirror.baidu.com/pypi/simple
!python -m pip install --upgrade paddlenlp -i https://pypi.org/simple
!pip install --upgrade pip
!pip install --upgrade sentencepiece 
!pip install wechaty

文本匹配部分

  • 文本语义匹配是NLP最基础的任务之一,简单来说就是判断两段文本的语义相似度。应用场景广泛,比如搜索引擎、智能问答、知识检索、信息流推荐等。

  • 为什么要用上这个功能呢,因为如果我们直接基于关键词匹配去判断用户需求的话,可能会出现理解错误的情况。比如如果用户输入“我可讨厌星座了”,但是聊天机器人可能还是会给用户展示星座超能力;如果直接限于关键词“星座”严格匹配的话,那用户如果不小心多输入一个字或者标点符号都不能实现想要的功能,太不友好了。因此本项目利用文本匹配技术来判断用户是否真实需要查看星座未来超能力的功能。

  • 本次项目基于 PaddleNLP,使用百度开源的预训练模型 ERNIE1.0,构建语义匹配模型,来判断 2 个文本语义是否相同。

  • 从头训练一个模型的关键步骤有数据加载、数据预处理、模型搭建、模型训练和评估,具体可参照https://aistudio.baidu.com/aistudio/projectdetail/1972174, 在这我们就直接调用已经训练好的语义匹配模型进行应用。

  • 下载已经训练好的语义匹配模型, 并解压

! wget https://paddlenlp.bj.bcebos.com/models/text_matching/pointwise_matching_model.tar
! tar -xvf pointwise_matching_model.tar
  • 具体代码部分(match.py文件)

对话闲聊部分

  • 近年来,人机对话系统受到了学术界和产业界的广泛关注。开放域对话系统希望机器可以流畅自然地与人进行交互,既可以进行日常问候类的闲聊,又可以完成特定功能。

  • 随着深度学习技术的不断发展,聊天机器人变得越来越智能。我们可以通过机器人来完成一些机械性的问答工作,也可以在闲暇时和智能机器人进行对话,他们的出现让生活变得更丰富多彩。

  • 本项目载入该功能,也是希望人们可以在寂寞无聊的时候有个聊天的小伙伴,虽然有时候他可能会不知所云,但他永远会在那等你。

  • 具体代码部分(chat.py文件)

  • PaddleNLP针对生成式任务提供了generate()函数,内嵌于PaddleNLP所有的生成式模型。支持Greedy Search、Beam Search和Sampling解码策略,用户只需指定解码策略以及相应的参数即可完成预测解码,得到生成的sequence的token ids以及概率得分。

  • PaddleNLP对于各种预训练模型已经内置了相应的tokenizer,指定想要使用的模型名字即可加载对应的tokenizer。

  • PaddleNLP提供了GPT,UnifiedTransformer等中文预训练模型,可以通过预训练模型名称完成一键加载。这次用的是一个小的中文GPT预训练模型。其他预训练模型请参考模型列表

主函数部分(main.py)

  • 星座今日运势需要自己申请一下接口,网址:星座运势,将你申请到的APIKEY填入values['key']
  • 运行本函数的时候,不要忘记云服务器也要开启哦,这样你的微信号才能变身为算命大师哦,不然只能在本地感受了。

后记

项目可改进的部分

  • 闲聊模型不够强大,会出现不知所云的情况,又或者更偏向于文本续写而不是对话。甚至会出现不太好的话语,这应该是训练语料导致的,模型预训练时语料没有清洗,以后可以考虑用干净的更符合本项目的对话语料微调。
  • 对话语义匹配的调用方式过于单调,希望之后可以改进。
  • 功能还不够多,结合paddle提供的优秀资源还可以创造出更多更好玩的功能。

互相交流进步

参考资料

最后的最后,觉得项目写的好的话,记得fork和爱心哦,谢谢啦!!!

TGRNet: A Table Graph Reconstruction Network for Table Structure Recognition

TGRNet: A Table Graph Reconstruction Network for Table Structure Recognition Xue, Wenyuan, et al. "TGRNet: A Table Graph Reconstruction Network for Ta

Wenyuan 68 Jan 04, 2023
Pipeline code for Sequential-GAM(Genome Architecture Mapping).

Sequential-GAM Pipeline code for Sequential-GAM(Genome Architecture Mapping). mapping whole_preprocess.sh include the whole processing of mapping. usa

3 Nov 03, 2022
Full Stack Deep Learning Labs

Full Stack Deep Learning Labs Welcome! Project developed during lab sessions of the Full Stack Deep Learning Bootcamp. We will build a handwriting rec

Full Stack Deep Learning 1.2k Dec 31, 2022
A non-linear, non-parametric Machine Learning method capable of modeling complex datasets

Fast Symbolic Regression Symbolic Regression is a non-linear, non-parametric Machine Learning method capable of modeling complex data sets. fastsr aim

VAMSHI CHOWDARY 3 Jun 22, 2022
The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection .

GCoNet The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection . Trained model Download final_gconet.pth

Qi Fan 46 Nov 17, 2022
U-Net: Convolutional Networks for Biomedical Image Segmentation

Deep Learning Tutorial for Kaggle Ultrasound Nerve Segmentation competition, using Keras This tutorial shows how to use Keras library to build deep ne

Yihui He 401 Nov 21, 2022
A certifiable defense against adversarial examples by training neural networks to be provably robust

DiffAI v3 DiffAI is a system for training neural networks to be provably robust and for proving that they are robust. The system was developed for the

SRI Lab, ETH Zurich 202 Dec 13, 2022
OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis

OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis Overview OpenABC-D is a large-scale labeled dataset generate

NYU Machine-Learning guided Design Automation (MLDA) 31 Nov 22, 2022
MDETR: Modulated Detection for End-to-End Multi-Modal Understanding

MDETR: Modulated Detection for End-to-End Multi-Modal Understanding Website • Colab • Paper This repository contains code and links to pre-trained mod

Aishwarya Kamath 770 Dec 28, 2022
Research using Cirq!

ReCirq Research using Cirq! This project contains modules for running quantum computing applications and experiments through Cirq and Quantum Engine.

quantumlib 230 Dec 29, 2022
The pytorch implementation of DG-Font: Deformable Generative Networks for Unsupervised Font Generation

DG-Font: Deformable Generative Networks for Unsupervised Font Generation The source code for 'DG-Font: Deformable Generative Networks for Unsupervised

130 Dec 05, 2022
Encode and decode text application

Text Encoder and Decoder Encode and decode text in many ways using this application! Encode in: ASCII85 Base85 Base64 Base32 Base16 Url MD5 Hash SHA-1

Alice 1 Feb 12, 2022
4th place solution to datafactory challenge by Intermarché.

Solution to Datafactory challenge by Intermarché. 4th place solution to datafactory challenge by Intermarché. The objective of the challenge is to pre

Raphael Sourty 11 Mar 19, 2022
Neural Network to colorize grayscale images

#colornet Neural Network to colorize grayscale images Results Grayscale Prediction Ground Truth Eiji K used colornet for anime colorization Sources Au

Pavel Hanchar 3.6k Dec 24, 2022
Kaggle Ultrasound Nerve Segmentation competition [Keras]

Ultrasound nerve segmentation using Keras (1.0.7) Kaggle Ultrasound Nerve Segmentation competition [Keras] #Install (Ubuntu {14,16}, GPU) cuDNN requir

179 Dec 28, 2022
EasyMocap is an open-source toolbox for markerless human motion capture from RGB videos.

EasyMocap is an open-source toolbox for markerless human motion capture from RGB videos. In this project, we provide the basic code for fitt

ZJU3DV 2.2k Jan 05, 2023
Unofficial Implementation of Oboe (SIGCOMM'18').

Oboe-Reproduce This is the unofficial implementation of the paper "Oboe: Auto-tuning video ABR algorithms to network conditions, Zahaib Akhtar, Yun Se

Tianchi Huang 13 Nov 04, 2022
Official code for MPG2: Multi-attribute Pizza Generator: Cross-domain Attribute Control with Conditional StyleGAN

This is the official code for Multi-attribute Pizza Generator (MPG2): Cross-domain Attribute Control with Conditional StyleGAN. Paper Demo Setup Envir

Fangda Han 5 Sep 01, 2022
Classification of Long Sequential Data using Circular Dilated Convolutional Neural Networks

Classification of Long Sequential Data using Circular Dilated Convolutional Neural Networks arXiv preprint: https://arxiv.org/abs/2201.02143. Architec

19 Nov 30, 2022
A universal memory dumper using Frida

Fridump Fridump (v0.1) is an open source memory dumping tool, primarily aimed to penetration testers and developers. Fridump is using the Frida framew

551 Jan 07, 2023