LLM vs traditional NLP/ML/SDE
- Traditional software/ML/NLP are precise, rule-driven, and task-specific.
- LLMs / AIGC are general-purpose, context-aware, and capable of intelligent generation and interaction.
- Adopting LLMs and AIGC helps enterprises boost productivity, automate processes, enhance creativity, and improve user experiences.
- Recommended entry points:
- Customer service bots
- Internal knowledge assistants
- Marketing content generation
- AI pair programming tools
- AI-powered analytics interfaces
LLM / AIGC vs. Traditional Software, ML, and NLP
Category | Examples | Core Characteristics | Data/Training Dependency | Capability Scope |
---|---|---|---|---|
Traditional Software | ERP, CRM, Web Apps | Rule-based, deterministic logic | No training needed, logic is explicitly coded | Limited to predefined tasks (e.g., reporting, workflows) |
Traditional ML | Linear regression, decision trees, SVM | Feature-driven, narrow generalization | Requires manual feature engineering and labeled data | Task-specific optimization (e.g., demand forecasting) |
Traditional NLP | TF-IDF, LSTM, CRF | Task-specific, modular approach | Needs labeled corpora for each task | Poor generalization, brittle across tasks |
LLMs (Large Language Models) | GPT, Claude, Gemini | Pretraining + fine-tuning, self-supervised | Trained on large-scale unlabeled text | Strong generalization, few/zero-shot across tasks |
AIGC (AI-Generated Content) | Text, image, video, code generation | Generative, often multimodal | Depends on massive, diverse datasets | Capable of creating novel content beyond automation |
Structured vs Unstructured task
Task | Data Type | Representative Tech | Sample Task |
---|---|---|---|
Structured | Table, Database | Traditional ML、SQL | Customer Ratiing Prediction; Price Prediction |
Unstructured | Text, Image, Voice | NLP、LLM、CV | Intelligent Answer, Content Generation, Image Recognition |
Evaluation
- 主观评测 Subjective evaluation: steup rating rules, AI evaluation
- 客观指标(如下表): have options on the test dataset
类型 | 常见任务 | 指标 | 特点 |
---|---|---|---|
NLU | 分类/问答等 | Accuracy, F1, EM, Span-F1 | 语义理解能力评估,评估系统对文本语义的理解能力,常用于分类、实体识别、问答等任务。 |
文本生成任务 | 摘要/翻译/写作 | BLEU、ROUGE、METEOR、BERTScore | 测文本生成与参考文本的相似度,e.g. 计算 生成文本与一个或多个参考文本之间的 n-gram 精确匹配率。 |
模型本身评估 | 语言建模 | Perplexity | 越低越好,表示模型越“不困惑” |