# llm_extract_mapper Extract structured fields from text using an LLM; write results to meta. Part of the llm_* semantic ops family. This operator uses an LLM to extract user-defined fields from each sample's text (or multiple input keys). You provide an `output_schema` (key → extraction instruction). Results are written to `meta[meta_output_key]` or to individual meta keys. Supports structured (JSON) and unstructured (e.g. plain text, jsonl) input. Token/cost usage is recorded in `meta[llm_semantic_usage]` (prompt_tokens, completion_tokens, total_tokens, optional cost_estimate). 使用 LLM 从文本中提取用户定义的结构化字段;结果写入 meta。支持结构化与非结构化输入,并记录 token/cost 用量。 Type 算子类型: **mapper** Tags 标签: gpu, vllm, hf, api ## 🔧 Parameter Configuration 参数配置 | name 参数名 | type 类型 | default 默认值 | desc 说明 | |--------|------|--------|------| | `input_keys` | list | required | Sample keys to build input text (e.g. `["text"]` or `["query","response"]`). | | `output_schema` | dict | required | `{output_key: "extraction instruction"}`. | | `api_or_hf_model` | str | `'gpt-4o'` | Model name for API or HuggingFace. | | `meta_output_key` | str, optional | `'llm_extract'` | If set, write full result to `meta[meta_output_key]`. | | `knowledge_grounding_key` | str, optional | `None` | Optional sample key for per-sample grounding. | | `knowledge_grounding_fixed` | str, optional | `None` | Optional fixed grounding string. | | `is_hf_model` | bool | `False` | If true, use HuggingFace/Transformers. | | `enable_vllm` | bool | `False` | If true, use vLLM backend. | | `api_endpoint` | str, optional | `None` | URL endpoint for the API. | | `response_path` | str, optional | `None` | Path to extract content from API response. | | `system_prompt` | str, optional | `None` | Override default extraction system prompt. | | `try_num` | int | `3` | Retries on parse/API failure. | | `model_params` | dict | `{}` | Parameters for model init. | | `sampling_params` | dict | `{}` | Sampling params (e.g. temperature, top_p). | ## 📊 Effect demonstration 效果演示 The examples below match the [unit tests](../../../tests/ops/mapper/test_llm_extract_mapper.py). **Concrete field values depend on the model and API**; only shape and keys are guaranteed. 下列示例与单元测试场景一致;**具体抽取内容随模型与接口变化**,文档中仅示意典型结果。 ### test_extract_default ```python LLMExtractMapper( input_keys=["text"], output_schema={ "topic": "One short phrase: main topic.", "sentiment": "One word: positive, negative, or neutral.", }, api_or_hf_model="gpt-4o", meta_output_key="llm_extract", try_num=2, ) ``` #### 📥 input data 输入数据
The stock market rose today. Investors are optimistic.
Bad weather caused delays. Many people were upset.
The stock market rose today. Investors are optimistic.
Bad weather caused delays. Many people were upset.