Gurlon Docs¶
Overview¶
gurlon
is a library designed to make the process of exporting data from Dynamo to your local filesystem easier.
Key Concepts
There are 3 main steps to the gurlon
export process:
- Instantiate a new
DataExporter
and invokeexport_data
to begin a DynamoDB PointInTimeExport to S3 - Call the
DataExporter
functiondownload_data
once the DynamoDB export is complete to combine the exported data into a single json file on your local filesystem - Transform your local copy of the exported table data into another storage format:
csv
,parquet
Installation¶
Export Data from DynamoDB to S3¶
In order to eventually run SQL queries on your DynamoDB table data, it first needs to be exported to S3.
PITR Must be Enabled
Your DynamoDB table needs to have point-in-time recovery enabled in order to perform ExportTableToPointInTime operations.
Create a DataExporter
¶
Import the DataExporter
class into your Python file, and create a DataExporter
instance by passing the following parameters:
aws_region: str
table_name: str
bucket_name: str
from gurlon.processor import DataExporter
exporter = DataExporter("us-west-1", "gurlon-table", "gurlon-bucket")
Provide AWS Credentials¶
Make sure the environment this code is executing in supplies your AWS credentials through either:
- Environment variables - AWS Docs Reference
- The
~/.aws/config
file - AWS Docs Reference
Additional Details on Authentication Process
Gurlon uses boto3
to perform AWS operations, so you can read up more on the underlying authentication process here.
Trigger the Export¶
Call the export_data
function to begin exporting your table data to S3.
If the operation succeeds, the export ARN will be returned.
from gurlon.processor import DataExporter
exporter = DataExporter("us-west-1", "gurlon-table", "gurlon-bucket")
export_arn = exporter.export_data()
Download Exported Data¶
Once your table export to S3 is complete, you can download the data to your local filesystem.
from pathlib import Path
from gurlon.processor import DataExporter
exporter = DataExporter("us-west-1", "gurlon-table", "gurlon-bucket")
exporter.table_export_arn = "YOUR:TABLE:EXPORT:ARN"
download_dir = Path.home() / "Downloads" / "dynamodb_exports"
exporter.download_data(download_dir=download_dir)
Output¶
Gurlon takes care of decompressing the exported data and combining it into a valid JSON file. This combined JSON file is stored inside the download_dir
you specified previously.
Transform the Data to Different File Types¶
Success
Now that the exported data is present locally, you can begin to transform it into different formats.
Create a DataTransformer
¶
from pathlib import Path
from gurlon.processor import DataTransformer
download_dir = Path.home() / "Downloads" / "dynamodb_exports"
combined_data = download_dir / "combined_data.json"
transformer = DataTransformer(combined_data)
Parquet¶
from pathlib import Path
from gurlon.processor import DataTransformer
download_dir = Path.home() / "Downloads" / "dynamodb_exports"
combined_data = download_dir / "combined_data.json"
transformer = DataTransformer(combined_data)
parquet = transformer.to_parquet()
CSV¶
from pathlib import Path
from gurlon.processor import DataTransformer
download_dir = Path.home() / "Downloads" / "dynamodb_exports"
combined_data = download_dir / "combined_data.json"
transformer = DataTransformer(combined_data)
csv = transformer.to_csv()
DuckDB¶
from pathlib import Path
from gurlon.processor import DataTransformer
download_dir = Path.home() / "Downloads" / "dynamodb_exports"
combined_data = download_dir / "combined_data.json"
transformer = DataTransformer(combined_data)
duckdb = transformer.to_duckdb()
SQLite Table¶
from sqlmodel import Field, SQLModel
from gurlon.processor import DataTransformer
class TableItemModel(SQLModel, table=True):
id: int | None = Field(default=None, primary_key=True)
user_id: str
user_name: str
email: str
role: str
full_name: str
transformer = DataTransformer(combined_data)
sql = transformer.to_sqlmodel(TableItemModel)