Transformers Pipeline
使用 Transformers 的 Pipeline 进行推理
安装依赖包
pip install datasets evaluate transformers[sentencepiece]
英文情感分类
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
classifier(
    [
        "I've been waiting for a HuggingFace course my whole life.",
        "I hate this so much!",
    ]
)
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
[{'label': 'POSITIVE', 'score': 0.9598048329353333},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]
中文情感分类
模型:uer/roberta-base-finetuned-dianping-chinese
from transformers import AutoModelForSequenceClassification,AutoTokenizer,pipeline
checkpoint = "uer/roberta-base-finetuned-dianping-chinese"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
text_classification = pipeline('sentiment-analysis', tokenizer=tokenizer, model=model)
raw_inputs = [
    "我一直都在等待 HuggingFace 课程。",
    "我非常讨厌这个!",
]
for _ in range(500):
    text_classification(raw_inputs)
[{'label': 'positive (stars 4 and 5)', 'score': 0.8151589035987854},
 {'label': 'negative (stars 1, 2 and 3)', 'score': 0.9962126016616821}]
Pipeline 内部分解
Pipeline 将预处理、推理和后处理三部分组合在一起
❶ 预处理
from transformers import AutoTokenizer
checkpoint = "uer/roberta-base-finetuned-dianping-chinese"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
raw_inputs = [
    "我一直都在等待 HuggingFace 课程。",
    "我非常讨厌这个!",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
print(inputs)
结果:
{
    'input_ids': tensor([
        [  101,  2769,   671,  4684,  6963,  1762,  5023,  2521, 12199,  9949,  8221, 12122,  6440,  4923,   511,   102],
        [  101,  2769,  7478,  2382,  6374,  1328,  6821,   702,  8013,   102,     0,     0,     0,     0,     0,     0]
    ]), 
    'attention_mask': tensor([
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]
    ])
}
❷ 推理
from transformers import AutoModelForSequenceClassification
checkpoint = "uer/roberta-base-finetuned-dianping-chinese"
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(**inputs)
print(outputs.logits)
结果:
tensor([[-0.7019,  0.6344],
        [ 2.8446, -2.7260]], grad_fn=<AddmmBackward0>)
logits 不是概率,是模型最后一层输出的原始非标准化分数。
❸ 后处理
labels = model.config.id2label
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
for input, prediction in zip(raw_inputs, predictions):
    score = torch.max(prediction).item()
    label = labels[torch.argmax(prediction).item()]
    print(f"input: {input}")
    print(f"label: {label.upper()}, score: {score}")
    print()
结果:
input: 我一直都在等待 HuggingFace 课程。
label: POSITIVE (STARS 4 AND 5), score: 0.7918757796287537
input: 我非常讨厌这个!
label: NEGATIVE (STARS 1, 2 AND 3), score: 0.9962064027786255
LLM.int8 节省显存 & 加速推理
安装依赖包
pip install bitsandbytes accelerate scipy
from transformers import AutoModelForSequenceClassification,AutoTokenizer,pipeline
checkpoint = "uer/roberta-base-finetuned-dianping-chinese"
tokenizer = AutoTokenizer.from_pretrained(checkpoint, device_map='auto', load_in_8bit=True)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, load_in_8bit=True)
text_classification = pipeline('sentiment-analysis', tokenizer=tokenizer, model=model)
raw_inputs = [
    "我一直都在等待 HuggingFace 课程。",
    "我非常讨厌这个!",
]
outputs = text_classification(raw_inputs)
print(outputs)
[{'label': 'positive (stars 4 and 5)', 'score': 0.8151589035987854}, 
 {'label': 'negative (stars 1, 2 and 3)', 'score': 0.9962126016616821}]
Colab 中测量执行时间
安装插件
pip install ipython-autotime
在 代码单元格 中增加代码
%load_ext autotime
不使用 LLM.int8
%load_ext autotime
from transformers import AutoModelForSequenceClassification,AutoTokenizer,pipeline
checkpoint = "uer/roberta-base-finetuned-dianping-chinese"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
text_classification = pipeline('sentiment-analysis', tokenizer=tokenizer, model=model)
raw_inputs = [
    "我一直都在等待 HuggingFace 课程。",
    "我非常讨厌这个!",
]
for _ in range(500):
    outputs = text_classification(raw_inputs)
print('显存:', model.get_memory_footprint())
显存: 204546564
time: 2min 21s
使用 LLM.int8
%load_ext autotime
from transformers import AutoModelForSequenceClassification,AutoTokenizer,pipeline
checkpoint = "uer/roberta-base-finetuned-dianping-chinese"
tokenizer = AutoTokenizer.from_pretrained(checkpoint, device_map='auto', load_in_8bit=True)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, load_in_8bit=True)
text_classification = pipeline('sentiment-analysis', tokenizer=tokenizer, model=model)
raw_inputs = [
    "我一直都在等待 HuggingFace 课程。",
    "我非常讨厌这个!",
]
for _ in range(500):
    text_classification(raw_inputs)
print('显存:', model.get_memory_footprint())
显存: 119022084
time: 1min 46s
总结
- 不使用 
LLM.int8:显存: 205MB;用时: 141秒 - 使用 
LLM.int8:显存: 119MB;用时: 106秒 
显存减少📉 72%,速度提升📈 33%