ConvertDocumentContent
Overview¶
The ConvertDocumentContent
Block converts the content of a file to a specified format using Pandoc. This Block is useful for transforming documents into various formats (e.g., Markdown, HTML) to support diverse processing and display requirements.
Description¶
Converts file content to another format using pandoc.
Metadata¶
- Category: Data
Configuration Options¶
Name | Data Type | Description | Default Value |
---|---|---|---|
to_format | PandocToFormats |
PandocToFormats.MARKDOWN |
Inputs¶
Name | Data Type | Description |
---|---|---|
file | File |
Outputs¶
Name | Data Type | Description |
---|---|---|
output | str |
State Variables¶
No state variables available.
Example(s)¶
Example 1: Convert a document to Markdown¶
- Create a
ConvertDocumentContent
Block. - Set the
to_format
configuration toPandocToFormats.MARKDOWN
. - Provide a file as input.
- The Block will output the file’s content converted to Markdown.
Example 2: Convert a PDF to HTML for web display¶
- Set up a
ConvertDocumentContent
Block. - Set
to_format
toPandocToFormats.HTML
. - Provide a PDF file as input.
- The Block will output the content converted to HTML, suitable for embedding in a webpage.
Error Handling¶
- If the file cannot be accessed or is in an unsupported format, the Block will raise an error.
- If the
to_format
conversion is invalid or unsupported by Pandoc, the Block will raise an error.
FAQ¶
What file formats are supported?
The Block supports any format that Pandoc can read and convert. Common formats include Markdown, HTML, PDF, and DOCX. Check Pandoc’s documentation for a full list of supported formats.
How can I customize the output format?
You can set the to_format
configuration to the desired output format, such as PandocToFormats.HTML
or PandocToFormats.MARKDOWN
.
What happens if the file is too large?
Very large files may encounter performance issues or memory constraints, depending on the environment’s capabilities. For best results, test with smaller files if possible.