Solids are the functional unit of work in Dagster. A solid's responsibility is to read its inputs, perform an action, and emit outputs. Multiple solids can be connected to create a Pipeline.
Name | Description |
---|---|
@solid | The decorator used to define solids. The decorated function is called the compute_fn . The decorator returns a SolidDefinition |
InputDefinition | InputDefinitions define the inputs to a solid compute function. These are defined on the input_defs argument to the @solid decorator |
OutputDefinition | OutputDefinitions define the outputs of a solid compute function. These are defined on the output_defs argument to the @solid decorator |
SolidDefinition | Base class for solids. You almost never want to use initialize this class directly. Instead, you should use the @solid which returns a SolidDefinition |
Solids are used to define computations. Solids can later be assembled into Pipelines Pipelines. Solids generally perform one specific action and are used for batch computations. For example, you can use a solid to:
By default, all solids in a pipeline execute in the same process. In production environments, Dagster is usually configured so that each solid executes in its own process.
Solids have several important properties:
To define a solid, use the @solid
decorator. The decorated function is called the compute_fn
and must have context
as the first argument. The context provides access to important properties and objects, such as solid configuration and resources.
@solid
def my_solid(context):
return "hello"
Each solid has a set of inputs and outputs, which define the data it consumes and produces. Inputs and outputs are used to define dependencies between solids and to pass data between solids.
Both definitions have a few important properties:
IOManager
, which defines how the output or input is stored and loaded. See the IOManager concept page for more info.Inputs are passed as arguments to a solid's compute_fn
. The value of an input can be passed from the output of another solid, or stubbed (hardcoded) using config.
The most common way to define inputs is just to add arguments to the decorated function:
@solid
def my_input_solid(context, abc, xyz):
pass
A solid only starts to execute once all of its inputs have been resolved. Inputs can be resolved in two ways:
You can use a Dagster Type to provide a function that validates a solid's input every time the solid runs. In this case, you use InputDefinitions
corresponding to the decorated function arguments.
MyDagsterType = DagsterType(type_check_fn=lambda _, value: value % 2 == 0, name="MyDagsterType")
@solid(input_defs=[InputDefinition(name="abc", dagster_type=MyDagsterType)])
def my_typed_input_solid(context, abc):
pass
Outputs are yielded from a solid's compute_fn
. By default, all solids have a single output called "result".
When you have one output, you can return the output value directly.
@solid
def my_output_solid(context):
return 5
To define multiple outputs, or to use a different output name than "result", you can provide OutputDefinitions
to the @solid
decorator.
When you have more than one output, you must yield
an instance of the Output
class to disambiguate between outputs.
@solid(
output_defs=[
OutputDefinition(name="first_output"),
OutputDefinition(name="second_output"),
],
)
def my_multi_output_solid(context):
yield Output(5, output_name="first_output")
yield Output(6, output_name="second_output")
Like inputs, outputs can also have Dagster Types.
The first parameter of a solids compute_fn
is the context object, which is an instance of SolidExecutionContext
. The context provides access to:
context.solid_config
)context.log
)context.resources
)context.run_id
)For example, to access the logger and log a info message:
@solid(config_schema={"name": str})
def context_solid(context):
name = context.solid_config["name"]
context.log.info(f"My name is {name}")
All definitions in dagster expose a config_schema
, making them configurable and parameterizable. The configuration system is explained in detail on Config Schema.
Solid definitions can specify a config_schema
for the solid's configuration. The configuration is accessible through the solid context at runtime. Therefore, solid configuration can be used to specify solid behavior at runtime, making solids more flexible and reusable.
For example, we can define a solid where the API endpoint it queries is define through it's configuration:
@solid(config_schema={"api_endpoint": str})
def my_configurable_solid(context):
api_endpoint = context.solid_config["api_endpoint"]
data = requests.get(f"{api_endpoint}/data").json()
return data
Solids are used within a @pipeline
. You can see more information on the Pipelines page. You can also execute a single solid, usually within a test context, using the execute_solid
function. More information can be found at Testing single solid execution
You may find the need to create utilities that help generate solids. In most cases, you should parameterize solid behavior by adding solid configuration. You should reach for this pattern if you find yourself needing to vary the arguments to the @solid
decorator or SolidDefinition
themselves, since they cannot be modified based on solid configuration.
To create a solid factory, you define a function that returns a SolidDefinition
, either directly or by decorating a function with the solid dectorator.
def x_solid(
arg,
name="default_name",
input_defs=None,
**kwargs,
):
"""
Args:
args (any): One or more arguments used to generate the nwe solid
name (str): The name of the new solid.
input_defs (list[InputDefinition]): Any input definitions for the new solid. Default: None.
Returns:
function: The new solid.
"""
@solid(name=name, input_defs=input_defs or [InputDefinition("start", Nothing)], **kwargs)
def _x_solid(context):
# Solid logic here
pass
return _x_solid
Why is a solid called a "solid"? It is a long and meandering journey, from a novel concept, to a familiar acronym, and back to a word.
In a data management system, there are two broad categories of data: source data—meaning the data directly inputted by a user, gathered from an uncontrolled external system, or generated directly by a sensor—and computed data—meaning data that is either created by computing on source data or on other computed data. Management of computed data is the primary concern of Dagster. Another name for computed data would be software-structured data. Or SSD. Given that SSD is already a well-known acronym for Solid State Drives we named our core concept for software-structured data a Solid.