feat: Add file_converter extension module (Issue #54)#56
feat: Add file_converter extension module (Issue #54)#56vaibhav45sktech wants to merge 9 commits intodbpedia:mainfrom
Conversation
- Create new file_converter.py extension module in databusclient/extensions/ - Implements FileConverter class with streaming pipeline support - Supports gzip decompression with optional checksum validation - Provides methods for compress_gzip_stream and decompress_gzip_stream - Minimal version as suggested in issue to start with gzip + checksum - Can be extended later to support more compression formats
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new streaming FileConverter extension providing multi-format detect/convert/compress/decompress (gzip, bz2, xz, optional zstd) and checksum utilities; integrates it into download flow and CLI (new --decompress flag), updates package exports, and adds comprehensive tests for conversion behavior. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI
participant Download as Download API
participant Vault as Vault/Auth
participant Remote as Remote Server
participant Converter as FileConverter
participant FS as Local filesystem
CLI->>Download: request download (convert_to / decompress)
Download->>Vault: request token exchange (if required)
Vault-->>Download: bearer token
Download->>Remote: GET artifact (with token if any) [stream]
Remote-->>Download: streamed bytes
Download->>Converter: stream bytes + source_format,target_format,validate_checksum
Converter-->>Download: converted stream / checksum result
Download->>FS: write final file (or temp + cleanup)
FS-->>CLI: success / error
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 1 | ❌ 4❌ Failed checks (4 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@databusclient/extensions/file_converter.py`:
- Around line 38-50: The code creates source_hasher from expected_checksum but
never uses it, so remove the unused expected_checksum parameter and the
source_hasher/hash-from-compressed-stream logic from the FileConverter method
that reads gzip (the block using gzip.open, FileConverter.CHUNK_SIZE, hasher,
and output_stream), keep only validate_checksum-driven hasher for decompressed
chunks, and update the method signature and return value accordingly; then
update callers to perform checksum validation on the compressed input stream
(e.g., via validate_checksum_stream) before calling this decompression routine.
🧹 Nitpick comments (1)
databusclient/extensions/file_converter.py (1)
70-103:seek(0)assumes a seekable stream; return type is misleading.Two concerns:
Line 88:
input_stream.seek(0)will raise on non-seekable streams (e.g., network response bodies, pipes). Since the PR objective targets integration with the download pipeline, callers will need to be aware of this constraint. Consider either documenting the seekable requirement, removing theseek(0)call (let the caller manage stream position), or accepting a non-seekable stream and removing the seek.Return type: The method signature says
-> boolbut it never returnsFalse— it either returnsTrueor raisesIOError. Consider returningboolwithout raising (let the caller decide), or changing the return type toNoneand only raising on failure. Pick one contract and be consistent.Suggested approach: remove seek, return bool without raising
`@staticmethod` def validate_checksum_stream( input_stream: BinaryIO, expected_checksum: str ) -> bool: """Validate SHA256 checksum of a stream. Args: - input_stream: Input stream to validate + input_stream: Input stream to validate (must be positioned at start) expected_checksum: Expected SHA256 checksum Returns: - True if checksum matches + True if checksum matches, False otherwise - - Raises: - IOError: If checksum validation fails """ hasher = hashlib.sha256() - input_stream.seek(0) - + while True: chunk = input_stream.read(FileConverter.CHUNK_SIZE) if not chunk: break hasher.update(chunk) - + computed = hasher.hexdigest() - if computed.lower() != expected_checksum.lower(): - raise IOError( - f"Checksum mismatch: expected {expected_checksum}, " - f"got {computed}" - ) - - return True + return computed.lower() == expected_checksum.lower()
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
databusclient/extensions/file_converter.py (3)
102-105:IOErroris semantically incorrect for a checksum mismatch; considerValueError.
IOError(aliased toOSErrorin Python 3) conventionally signals operating-system-level I/O failures (file not found, disk full, permission denied). A checksum mismatch is a data-integrity error —ValueErroror a customChecksumMismatchErrorwould let callers distinguish between a genuine I/O failure and bad data without catching allOSErrors.♻️ Proposed change
- raise IOError( + raise ValueError( f"Checksum mismatch: expected {expected_checksum}, " f"got {computed}" )Also update the docstring:
- Raises: - IOError: If checksum validation fails + Raises: + ValueError: If checksum validation fails🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@databusclient/extensions/file_converter.py` around lines 102 - 105, Replace the semantically incorrect IOError raised for checksum mismatches with a more appropriate exception: change the raise in file_converter.py (the checksum-checking block that currently raises IOError("Checksum mismatch: expected..., got ...")) to raise ValueError with the same message, or define and raise a custom ChecksumMismatchError class and use that instead; also update the surrounding function/class docstring (the docstring for the checksum verification routine in file_converter.py) to document the new exception type so callers know to catch ValueError or ChecksumMismatchError.
21-21:validate_checksumparameter name should becompute_checksum.The parameter only computes and returns the digest — it performs no comparison. The docstring correctly describes the behavior as "compute", but the parameter name implies validation. The past review had proposed this rename; it wasn't carried through.
♻️ Proposed rename
def decompress_gzip_stream( input_stream: BinaryIO, output_stream: BinaryIO, - validate_checksum: bool = False, + compute_checksum: bool = False, ) -> Optional[str]: """Decompress gzip stream with optional checksum computation. ... validate_checksum: Whether to compute a SHA-256 checksum of + compute_checksum: Whether to compute a SHA-256 checksum of the decompressed output. ... """ - hasher = hashlib.sha256() if validate_checksum else None + hasher = hashlib.sha256() if compute_checksum else None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@databusclient/extensions/file_converter.py` at line 21, Rename the parameter validate_checksum to compute_checksum throughout the file to match behaviour described in the docstring: update the function/method signature(s) that currently declare validate_checksum (and any default value False) to compute_checksum: bool = False, update all internal variable references and any return/tuple keys or comments that use validate_checksum to compute_checksum, and update any call sites inside databusclient/extensions/file_converter.py (and its unit tests if present) so callers use the new name; ensure type hints, docstring example/parameter list, and any logging/messages reflect the new name.
12-107: No tests provided for the new module.The PR adds a non-trivial streaming pipeline but no unit tests. At minimum, these cases should be covered:
- Round-trip: compress → decompress restores original bytes.
validate_checksum_streampasses on a correct hash and raises on a bad hash.decompress_gzip_streamwithvalidate_checksum=Truereturns the correct hex digest.- Non-seekable stream handling for
validate_checksum_stream.Would you like me to generate a
tests/test_file_converter.pyskeleton covering the above cases, or open a follow-up issue to track this?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@databusclient/extensions/file_converter.py` around lines 12 - 107, Add unit tests for the FileConverter class to cover the streaming pipeline: create tests/test_file_converter.py and include (1) a round-trip test that writes random bytes, uses FileConverter.compress_gzip_stream to compress into a buffer and then FileConverter.decompress_gzip_stream to decompress and assert original bytes are restored; (2) tests for validate_checksum_stream that assert True on a correct SHA-256 and that an IOError is raised on a bad hash; (3) a test that calls FileConverter.decompress_gzip_stream with validate_checksum=True and asserts the returned hex digest equals the SHA-256 of the decompressed bytes; and (4) a test for non-seekable input to validate_checksum_stream using a custom non-seekable wrapper (or io.BufferedReader over a pipe-like object) to ensure validation still works without calling seek; use BinaryIO-compatible buffers (io.BytesIO) and reference FileConverter.compress_gzip_stream, FileConverter.decompress_gzip_stream, and FileConverter.validate_checksum_stream in assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@databusclient/extensions/file_converter.py`:
- Around line 91-107: validate_checksum_stream currently reads to EOF and
returns without resetting the stream, which breaks the validate-then-decompress
pattern used by decompress_gzip_stream; change validate_checksum_stream to seek
input_stream back to 0 before returning and update its docstring to state that
input_stream must be seekable (and that the function will reset the stream
position), so callers like FileConverter.decompress_gzip_stream can safely read
from the start after validation.
- Around line 12-13: The package API currently doesn't export FileConverter from
the extensions package, forcing consumers to import
databusclient.extensions.file_converter.FileConverter directly; update
extensions/__init__.py to import and export FileConverter so it can be accessed
as databusclient.extensions.FileConverter (e.g., add "from .file_converter
import FileConverter" and include it in __all__), or document that only direct
module imports are supported—modify the __init__ export to reference the
FileConverter class name to restore the expected package-level import.
---
Nitpick comments:
In `@databusclient/extensions/file_converter.py`:
- Around line 102-105: Replace the semantically incorrect IOError raised for
checksum mismatches with a more appropriate exception: change the raise in
file_converter.py (the checksum-checking block that currently raises
IOError("Checksum mismatch: expected..., got ...")) to raise ValueError with the
same message, or define and raise a custom ChecksumMismatchError class and use
that instead; also update the surrounding function/class docstring (the
docstring for the checksum verification routine in file_converter.py) to
document the new exception type so callers know to catch ValueError or
ChecksumMismatchError.
- Line 21: Rename the parameter validate_checksum to compute_checksum throughout
the file to match behaviour described in the docstring: update the
function/method signature(s) that currently declare validate_checksum (and any
default value False) to compute_checksum: bool = False, update all internal
variable references and any return/tuple keys or comments that use
validate_checksum to compute_checksum, and update any call sites inside
databusclient/extensions/file_converter.py (and its unit tests if present) so
callers use the new name; ensure type hints, docstring example/parameter list,
and any logging/messages reflect the new name.
- Around line 12-107: Add unit tests for the FileConverter class to cover the
streaming pipeline: create tests/test_file_converter.py and include (1) a
round-trip test that writes random bytes, uses
FileConverter.compress_gzip_stream to compress into a buffer and then
FileConverter.decompress_gzip_stream to decompress and assert original bytes are
restored; (2) tests for validate_checksum_stream that assert True on a correct
SHA-256 and that an IOError is raised on a bad hash; (3) a test that calls
FileConverter.decompress_gzip_stream with validate_checksum=True and asserts the
returned hex digest equals the SHA-256 of the decompressed bytes; and (4) a test
for non-seekable input to validate_checksum_stream using a custom non-seekable
wrapper (or io.BufferedReader over a pipe-like object) to ensure validation
still works without calling seek; use BinaryIO-compatible buffers (io.BytesIO)
and reference FileConverter.compress_gzip_stream,
FileConverter.decompress_gzip_stream, and FileConverter.validate_checksum_stream
in assertions.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@databusclient/extensions/file_converter.py`:
- Around line 102-109: The input_stream is only seeked back on success so when
the checksum check raises an IOError the stream stays at EOF; move the
input_stream.seek(0) to before the comparison of computed vs expected_checksum
(i.e., seek the input_stream right after computing computed = hasher.hexdigest()
and before comparing computed.lower() != expected_checksum.lower()) so the
stream is rewound regardless of an exception, and update the function docstring
to drop the word "successful" (so it documents that the stream will be rewound
even when a checksum mismatch occurs).
---
Duplicate comments:
In `@databusclient/extensions/file_converter.py`:
- Line 21: The parameter name validate_checksum is misleading because the
function only computes and returns a digest; rename the parameter to
compute_checksum across the FileConverter API in
databusclient/extensions/file_converter.py (update the function signature where
validate_checksum appears, any internal uses/branches that check it, and all
docstrings that reference "validate_checksum") and update all callers/tests to
use compute_checksum so the name matches behavior and the docstrings that
already say "compute" remain consistent.
|
Greetings @Integer-Ctrl,Could you please review this pr whenever available . I might be mistaken in some of the changes, so your guidance and feedback would be very helpful. Thank you! |
|
Hi @vaibhav45sktech, This PR misses the actual task. You have introduced a class that can be used to convert compressions. But this logic is not built into the download pipeline anywhere, or have I overlooked it? In that case, it is code that would have no use. However, the base conversion and checksum validation are already implemented, see databus-python-client/databusclient/api/download.py Lines 439 to 467 in 704136a The ticket #54 only contains ideas for improving the file conversion. |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (3)
databusclient/api/download.py (2)
164-166: Missing space inreturn{}(Line 166).Minor style nit, but consistent with PEP 8.
Proposed fix
- return{} + return {}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@databusclient/api/download.py` around lines 164 - 166, Fix the PEP8 spacing issue in databusclient/api/download.py by replacing the tight brace in the empty dict return with a space: change the `return{}` statement (in the block where `graph = jd` is set) to `return {}` so it conforms to the spacing convention.
366-419: "Streaming" path downloads the entire file to disk first, then converts.The comment says "Streaming download + conversion in a single pass" (Line 380), but the implementation writes the full compressed file to disk (Line 389-394), validates the checksum, then calls
_convert_compression_formatwhich reads and re-writes. This is download-then-convert, not single-pass streaming. The approach is reasonable for checksum-on-compressed-bytes, but the comment is misleading.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@databusclient/api/download.py` around lines 366 - 419, The comment claiming "Streaming download + conversion in a single pass" is incorrect because the code (using response.iter_content to write to filename, computing checksum_hasher, then calling _convert_compression_format) downloads the full compressed file to disk and only then converts it; update the comment to accurately state that the implementation downloads the compressed object to a temp file, validates checksum (when validate_checksum and checksum_hasher are used), and then performs conversion via _convert_compression_format (or alternatively implement true streaming conversion that decompresses on the fly while preserving checksum of compressed bytes if you prefer that behavior); reference symbols: response.iter_content, checksum_hasher, validate_checksum, filename, _convert_compression_format, _get_converted_filename, and _should_convert_file.databusclient/extensions/file_converter.py (1)
72-100: Docstring says magic "takes precedence if the extension is ambiguous" but extension always wins when matched.Extension is checked first and returned immediately (Line 92). Magic bytes are only consulted when no extension matches. The docstring implies magic could override an extension match. Minor wording nit — consider: "Falls back to magic-number detection when no known extension is found."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@databusclient/extensions/file_converter.py` around lines 72 - 100, The docstring for FileConverter.detect_format inaccurately states magic-number detection "takes precedence if the extension is ambiguous" while the implementation always returns on an extension match; update the docstring to reflect the actual behavior (extension-based detection first, magic-number detection used as a fallback when no known extension is found) or alternatively change the implementation to consult FileConverter.detect_format_by_magic before honoring an extension; reference the detect_format, detect_format_by_magic methods and COMPRESSION_EXTENSIONS to locate the logic to update.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@databusclient/api/download.py`:
- Around line 17-21: The import list in databusclient/api/download.py includes
unused symbols COMPRESSION_EXTENSIONS and COMPRESSION_MODULES which cause a Ruff
F401 failure; update the import from databusclient.extensions.file_converter to
only import FileConverter (remove COMPRESSION_EXTENSIONS and
COMPRESSION_MODULES) or, if those constants are needed later, reference them
where used or mark as used, ensuring the FileConverter import remains intact
(look for the import statement that currently lists FileConverter,
COMPRESSION_EXTENSIONS, COMPRESSION_MODULES).
- Line 1068: The print statement currently prints a literal "{}" instead of
interpolating databusURI; replace the incorrect print("QUERY {}",
databusURI.replace("\n", " ")) with a proper interpolated string (e.g., using an
f-string or .format) so the cleaned databusURI value is included in the log;
update the print call near the databusURI usage in databusclient/api/download.py
to use f"QUERY {databusURI.replace('\\n', ' ')}" (or equivalent) so the URI is
shown correctly.
- Line 95: The file mixes PEP 604 union syntax (str | None) and typing.Optional;
standardize to typing.Optional[str] for consistency by replacing all occurrences
of "str | None" with "Optional[str]" (including the function
_extract_checksum_from_node and the other annotated functions noted), and ensure
"from typing import Optional" is present in the top-level imports; update any
type hints like "int | None" similarly if present to keep a uniform style.
In `@databusclient/extensions/file_converter.py`:
- Around line 307-321: The try block in convert_file currently covers both the
read/write loop and os.remove(source_path), so if removing the source fails you
end up treating it as a conversion failure and deleting the target; to fix,
limit the try/except to only the conversion (the with _open_reader/_open_writer
and read/write loop) and move os.remove(source_path) outside that try (or into a
separate post-success block); keep the except behavior that removes target_path
if a conversion exception occurred, but do not remove target_path when source
deletion fails — instead handle source removal failure separately (log or raise
a distinct error) so successful outputs aren't deleted; refer to convert_file,
_open_reader, _open_writer, and FileConverter.CHUNK_SIZE to locate the code to
change.
- Around line 455-462: The zstd wrappers in _wrap_reader and _wrap_writer
currently call _zstd.ZstdDecompressor().stream_reader(...) and
_zstd.ZstdCompressor().stream_writer(...) with the default closefd=True which
closes caller-owned streams: update the calls in _wrap_reader (stream_reader)
and _wrap_writer (stream_writer) to pass closefd=False so the wrapper won't
close the underlying BinaryIO owned by the caller; keep existing behavior
elsewhere unchanged.
In `@tests/test_file_converter.py`:
- Around line 24-34: The test file imports several names that are never used
causing Ruff F401; remove the unused imports COMPRESSION_EXTENSIONS and
COMPRESSION_MODULES from databusclient.extensions.file_converter and remove
_detect_compression_format, _get_converted_filename, and
_convert_compression_format from databusclient.api.download so only
FileConverter and _should_convert_file (or whatever names the tests actually
use) remain imported, or alternatively reference those imported symbols in the
tests if they are intended to be tested; update the import block in
tests/test_file_converter.py to import only the symbols actually used (e.g.,
FileConverter and _should_convert_file) to resolve the linter error.
---
Nitpick comments:
In `@databusclient/api/download.py`:
- Around line 164-166: Fix the PEP8 spacing issue in
databusclient/api/download.py by replacing the tight brace in the empty dict
return with a space: change the `return{}` statement (in the block where `graph
= jd` is set) to `return {}` so it conforms to the spacing convention.
- Around line 366-419: The comment claiming "Streaming download + conversion in
a single pass" is incorrect because the code (using response.iter_content to
write to filename, computing checksum_hasher, then calling
_convert_compression_format) downloads the full compressed file to disk and only
then converts it; update the comment to accurately state that the implementation
downloads the compressed object to a temp file, validates checksum (when
validate_checksum and checksum_hasher are used), and then performs conversion
via _convert_compression_format (or alternatively implement true streaming
conversion that decompresses on the fly while preserving checksum of compressed
bytes if you prefer that behavior); reference symbols: response.iter_content,
checksum_hasher, validate_checksum, filename, _convert_compression_format,
_get_converted_filename, and _should_convert_file.
In `@databusclient/extensions/file_converter.py`:
- Around line 72-100: The docstring for FileConverter.detect_format inaccurately
states magic-number detection "takes precedence if the extension is ambiguous"
while the implementation always returns on an extension match; update the
docstring to reflect the actual behavior (extension-based detection first,
magic-number detection used as a fallback when no known extension is found) or
alternatively change the implementation to consult
FileConverter.detect_format_by_magic before honoring an extension; reference the
detect_format, detect_format_by_magic methods and COMPRESSION_EXTENSIONS to
locate the logic to update.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
databusclient/api/download.py (1)
137-137: Minor formatting issue: missing space before{}.Style nit - add a space between
returnand{}for consistency.🧹 Suggested fix
- return{} + return {}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@databusclient/api/download.py` at line 137, In databusclient.api.download update the stray return statement formatting by inserting a space between the keyword and the empty dict so the statement reads "return {}"; locate the return{} occurrence in the download module (the return at the end of the function around line 137) and replace it with the properly spaced form to satisfy style consistency.databusclient/extensions/file_converter.py (1)
427-431: Potential resource leak if zstd stream creation fails after file open.If
_zstd.ZstdDecompressor()orstream_reader()raises an exception afterfh = open(path, "rb"), the file handle will not be closed. Consider using a try/except or ensuring the file is closed on failure.🧹 Suggested fix
if fmt == "zstd" and _HAS_ZSTD: fh = open(path, "rb") - dctx = _zstd.ZstdDecompressor() - return dctx.stream_reader(fh) + try: + dctx = _zstd.ZstdDecompressor() + return dctx.stream_reader(fh) + except Exception: + fh.close() + raiseThe same pattern applies to
_open_writerat lines 438-441.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@databusclient/extensions/file_converter.py` around lines 427 - 431, The zstd branch in the reader (and similarly in _open_writer) opens the file with fh = open(path, "rb") before creating _zstd.ZstdDecompressor() and calling stream_reader(), which can leak fh if those calls raise; change the flow in the zstd branch of the reader (and _open_writer) to either create the decompressor/stream reader inside a context manager that guarantees fh is closed on exception or open the file inside a try/finally and explicitly close fh on error (i.e., ensure fh is closed before re-raising), referencing the _HAS_ZSTD check, _zstd.ZstdDecompressor, and stream_reader (and the corresponding writer equivalents) so the file handle is never left open on failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@databusclient/api/download.py`:
- Around line 1-17: Remove the unused hashlib import from
databusclient/api/download.py to fix the pipeline failure; locate the top-level
imports (including hashlib, json, os, typing, re, urllib.parse, requests,
SPARQLWrapper, tqdm and local imports like fetch_databus_jsonld,
get_databus_id_parts_from_file_url, compute_sha256_and_length and FileConverter)
and delete only the "import hashlib" line so checksum work continues to use
compute_sha256_and_length.
---
Nitpick comments:
In `@databusclient/api/download.py`:
- Line 137: In databusclient.api.download update the stray return statement
formatting by inserting a space between the keyword and the empty dict so the
statement reads "return {}"; locate the return{} occurrence in the download
module (the return at the end of the function around line 137) and replace it
with the properly spaced form to satisfy style consistency.
In `@databusclient/extensions/file_converter.py`:
- Around line 427-431: The zstd branch in the reader (and similarly in
_open_writer) opens the file with fh = open(path, "rb") before creating
_zstd.ZstdDecompressor() and calling stream_reader(), which can leak fh if those
calls raise; change the flow in the zstd branch of the reader (and _open_writer)
to either create the decompressor/stream reader inside a context manager that
guarantees fh is closed on exception or open the file inside a try/finally and
explicitly close fh on error (i.e., ensure fh is closed before re-raising),
referencing the _HAS_ZSTD check, _zstd.ZstdDecompressor, and stream_reader (and
the corresponding writer equivalents) so the file handle is never left open on
failure.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
databusclient/api/download.pydatabusclient/cli.pydatabusclient/extensions/file_converter.pytests/test_compression_conversion.pytests/test_file_converter.py
|
Greetings @Integer-Ctrl , Could you please review the pr |
Description
This PR adds a new
file_converter.pyextension module to address Issue #54. The module provides a streaming pipeline for file format conversion with support for gzip decompression and checksum validation.Changes
databusclient/extensions/file_converter.pymoduleFileConverterclass with streaming supportdecompress_gzip_stream()method with optional checksum validationcompress_gzip_stream()method for gzip compressionvalidate_checksum_stream()method for SHA256 checksum validationRelated Issues
Fixes #54
Type of change
Summary by CodeRabbit