Skip to main content

LLM vs traditional NLP/ML/SDE

  • Traditional software/ML/NLP are precise, rule-driven, and task-specific.
  • LLMs / AIGC are general-purpose, context-aware, and capable of intelligent generation and interaction.
  • Adopting LLMs and AIGC helps enterprises boost productivity, automate processes, enhance creativity, and improve user experiences.
  • Recommended entry points:
    • Customer service bots
    • Internal knowledge assistants
    • Marketing content generation
    • AI pair programming tools
    • AI-powered analytics interfaces

LLM / AIGC vs. Traditional Software, ML, and NLP

CategoryExamplesCore CharacteristicsData/Training DependencyCapability Scope
Traditional SoftwareERP, CRM, Web AppsRule-based, deterministic logicNo training needed, logic is explicitly codedLimited to predefined tasks (e.g., reporting, workflows)
Traditional MLLinear regression, decision trees, SVMFeature-driven, narrow generalizationRequires manual feature engineering and labeled dataTask-specific optimization (e.g., demand forecasting)
Traditional NLPTF-IDF, LSTM, CRFTask-specific, modular approachNeeds labeled corpora for each taskPoor generalization, brittle across tasks
LLMs (Large Language Models)GPT, Claude, GeminiPretraining + fine-tuning, self-supervisedTrained on large-scale unlabeled textStrong generalization, few/zero-shot across tasks
AIGC (AI-Generated Content)Text, image, video, code generationGenerative, often multimodalDepends on massive, diverse datasetsCapable of creating novel content beyond automation

Structured vs Unstructured task

TaskData TypeRepresentative TechSample Task
StructuredTable, DatabaseTraditional ML、SQLCustomer Ratiing Prediction; Price Prediction
UnstructuredText, Image, VoiceNLP、LLM、CVIntelligent Answer, Content Generation, Image Recognition

Evaluation

  1. 主观评测 Subjective evaluation: steup rating rules, AI evaluation
  2. 客观指标(如下表): have options on the test dataset
类型常见任务指标特点
NLU分类/问答等Accuracy, F1, EM, Span-F1语义理解能力评估,评估系统对文本语义的理解能力,常用于分类、实体识别、问答等任务。
文本生成任务摘要/翻译/写作BLEU、ROUGE、METEOR、BERTScore测文本生成与参考文本的相似度,e.g. 计算生成文本与一个或多个参考文本之间的 n-gram 精确匹配率。
模型本身评估语言建模Perplexity越低越好,表示模型越“不困惑”