Information Extractor

Extract structured data from unstructured text using an LLM.

Written By pvdyck

Last updated 18 minutes ago

Information Extractor

The Information Extractor uses an LLM to pull structured fields from unstructured text such as emails, documents, support tickets, or web content.

How It Works

You define a JSON schema describing the fields you want to extract. The node sends the input text and schema to the LLM, which returns a structured JSON object matching your schema.

Parameters

ParameterDescription
TextThe input text to extract data from. Supports expressions like {{ $json.body }}.
Schema TypeHow to define the extraction schema. Three options: From Attribute Descriptions (manually list fields and their purposes), From JSON Example (provide sample JSON -- types and required fields are inferred), or From JSON Schema (write a full JSON Schema definition).
System PromptOptional instructions to guide extraction behavior (e.g., "Extract dates in ISO format"). The node automatically appends format-specific instructions to your prompt.

Schema Example

{  "type": "object",  "properties": {    "name": { "type": "string", "description": "Person's full name" },    "email": { "type": "string", "description": "Email address" },    "orderNumber": { "type": "string", "description": "Order or reference number" }  },  "required": ["name"]}

Sub-Node Connections

InputRequiredDescription
AI Language ModelYesThe LLM powering the extraction (e.g., OpenRouter, OpenAI).
Output ParserNoAdditional output validation/formatting.

Example

Input: "Hi, I'm Jane Doe (jane@example.com). My order #98765 hasn't arrived."

Output:

{ "name": "Jane Doe", "email": "jane@example.com", "orderNumber": "98765" }

Tips

  • Add description fields to your schema properties β€” they help the LLM understand what to extract.
  • Use required to ensure critical fields are always returned.
  • For complex extractions, add a System Prompt with examples of expected output.
  • Works well chained after an HTTP Request node to extract data from fetched web pages.

Related