HiveStructurePaths
Documentation for HiveStructurePaths.
HiveStructurePaths.HiveSchemaHiveStructurePaths.build_hive_pathHiveStructurePaths.find_hive_filesHiveStructurePaths.parse_hive_path
HiveStructurePaths.HiveSchema — Type
HiveSchema(; parsers::Dict, order::Vector, filename::String)Defines the structure and parsing rules for a Hive file hierarchy.
Fields
parsers: Dict mapping key names to parsing functionsorder: Vector defining the hierarchical order of keys in pathsfilename: The target filename that appears in all Hive paths (one per schema)
HiveStructurePaths.build_hive_path — Method
build_hive_path(schema::HiveSchema, base_dir::AbstractString; kwargs...) → StringConstruct Hive-style output path with consistent ordering.
Path structure follows schema order: base_dir/key1=<val1>/key2=<val2>/.../filename where filename comes from schema.filename.
Examples
const schema = HiveSchema(
parsers = Dict{String, Function}(
"criterion" => identity,
"partition" => x -> parse(Int, x),
"k" => x -> parse(Int, x)
),
order = ["criterion", "partition", "k"],
filename = "data.arrow"
)
build_hive_path(schema, "data/binned"; criterion="depth_iso", partition=1)
# → "data/binned/criterion=depth_iso/partition=1/data.arrow"
build_hive_path(schema, "data/cluster_assignments"; partition=2, criterion="depth_iso", k=10)
# → "data/cluster_assignments/criterion=depth_iso/partition=2/k=10/data.arrow"
# Note that the order is consistent with the previous one; the order of `kwargs` does not matter.Arguments
base_dir: Base directory pathkwargs: Key-value pairs matching schema keys
Returns
Complete path string with Hive-style structure
HiveStructurePaths.find_hive_files — Method
find_hive_files(schema::HiveSchema, root_dir::AbstractString;
validate_keys=[], error_if_empty=false) -> Vector{String}Recursively find files that match the schema's filename AND structure.
Arguments
validate_keys: List of keys (e.g.[:criterion]) that MUST be present in the path for it to be considered valid.error_if_empty: If true, throws error if no matching files are found.
Returns
Sorted list of absolute paths.
HiveStructurePaths.parse_hive_path — Method
parse_hive_path(schema::HiveSchema, path::AbstractString; required_keys=[]) → NamedTupleExtract key-value pairs from Hive-style paths according to the schema.
Examples
const schema = HiveSchema(
parsers = Dict{String, Function}(
"criterion" => identity,
"partition" => x -> parse(Int, x),
"k" => x -> parse(Int, x)
),
order = ["criterion", "partition", "k"]
)
parse_hive_path(schema::HiveSchema,"data/binned/criterion=depth_iso/partition=1/data.arrow")
# → (criterion="depth_iso", partition=1, k=nothing)
parse_hive_path(schema::HiveSchema,"data/cluster_assignments/criterion=depth_iso/partition=2/k=10/data.arrow")
# → (criterion="depth_iso", partition=2, k=10)
# Validate required keys
parse_hive_path(schema::HiveSchema,"data/binned/criterion=depth_iso/partition=1/data.arrow"; required_keys=["criterion", "partition"])
# → (criterion="depth_iso", partition=1, k=nothing)Arguments
path: Path string containing Hive-style key=value segmentsrequired_keys: Optional list of keys that must be present (default: [])
Returns
NamedTuple with extracted values (nothing for missing fields)
Throws
ErrorExceptionif any required_keys are missing from the path