Gurlon Docs¶

Overview¶

gurlon is a library designed to make the process of exporting data from Dynamo to your local filesystem easier.

Key Concepts

There are 3 main steps to the gurlon export process:

Instantiate a new DataExporter and invoke export_data to begin a DynamoDB PointInTimeExport to S3
Call the DataExporter function download_data once the DynamoDB export is complete to combine the exported data into a single json file on your local filesystem
Transform your local copy of the exported table data into another storage format: csv, parquet

Installation¶

pipuv

pip install gurlon

uv add gurlon

Export Data from DynamoDB to S3¶

In order to eventually run SQL queries on your DynamoDB table data, it first needs to be exported to S3.

PITR Must be Enabled

Your DynamoDB table needs to have point-in-time recovery enabled in order to perform ExportTableToPointInTime operations.

Create a `DataExporter`¶

Import the DataExporter class into your Python file, and create a DataExporter instance by passing the following parameters:

aws_region: str
table_name: str
bucket_name: str

from gurlon.processor import DataExporter

exporter = DataExporter("us-west-1", "gurlon-table", "gurlon-bucket")

Provide AWS Credentials¶

Make sure the environment this code is executing in supplies your AWS credentials through either:

Environment variables - AWS Docs Reference
The ~/.aws/config file - AWS Docs Reference

Additional Details on Authentication Process

Gurlon uses boto3 to perform AWS operations, so you can read up more on the underlying authentication process here.

Trigger the Export¶

Call the export_data function to begin exporting your table data to S3.

If the operation succeeds, the export ARN will be returned.

from gurlon.processor import DataExporter

exporter = DataExporter("us-west-1", "gurlon-table", "gurlon-bucket")
export_arn = exporter.export_data()

Download Exported Data¶

Once your table export to S3 is complete, you can download the data to your local filesystem.

from pathlib import Path

from gurlon.processor import DataExporter

exporter = DataExporter("us-west-1", "gurlon-table", "gurlon-bucket")
exporter.table_export_arn = "YOUR:TABLE:EXPORT:ARN"

download_dir = Path.home() / "Downloads" / "dynamodb_exports"
exporter.download_data(download_dir=download_dir)

Output¶

Gurlon takes care of decompressing the exported data and combining it into a valid JSON file. This combined JSON file is stored inside the download_dir you specified previously.

Transform the Data to Different File Types¶

Success

Now that the exported data is present locally, you can begin to transform it into different formats.

Create a `DataTransformer`¶

from pathlib import Path

from gurlon.processor import DataTransformer

download_dir = Path.home() / "Downloads" / "dynamodb_exports"
combined_data = download_dir / "combined_data.json"
transformer = DataTransformer(combined_data)

Parquet¶

from pathlib import Path

from gurlon.processor import DataTransformer

download_dir = Path.home() / "Downloads" / "dynamodb_exports"
combined_data = download_dir / "combined_data.json"
transformer = DataTransformer(combined_data)

parquet = transformer.to_parquet()

CSV¶

from pathlib import Path

from gurlon.processor import DataTransformer

download_dir = Path.home() / "Downloads" / "dynamodb_exports"
combined_data = download_dir / "combined_data.json"
transformer = DataTransformer(combined_data)

csv = transformer.to_csv()

DuckDB¶

from pathlib import Path

from gurlon.processor import DataTransformer

download_dir = Path.home() / "Downloads" / "dynamodb_exports"
combined_data = download_dir / "combined_data.json"
transformer = DataTransformer(combined_data)

duckdb = transformer.to_duckdb()

SQLite Table¶

from sqlmodel import Field, SQLModel

from gurlon.processor import DataTransformer


class TableItemModel(SQLModel, table=True):
    id: int | None = Field(default=None, primary_key=True)
    user_id: str
    user_name: str
    email: str
    role: str
    full_name: str

transformer = DataTransformer(combined_data)
sql = transformer.to_sqlmodel(TableItemModel)