Google Cloud Fundamentals: Core Infraestructure - Storage in the Cloud

Google Cloud Fundamentals: Core Infraestructure - Storage in the Cloud

Cloud Storage: Scalable and Durable Object Storage

Cloud Storage is Google Cloud's object storage solution designed for developers and IT organizations.

Object Storage Explained:

  • Manages data as "objects" with unique identifiers (URLs).
  • Differs from file and block storage by focusing on self-contained units with metadata.
  • Commonly stores unstructured data like videos, pictures, and audio.

Cloud Storage Features:

  • Highly Scalable: Accommodates any amount of data with on-demand retrieval.
  • Versatile Use Cases:
    • Serving website content
    • Archiving and disaster recovery
    • Distributing large data objects
    • Storing binary large objects (BLOBs) for online content (videos, photos)
    • Backup and archived data
    • Intermediate results in processing workflows
  • Organized by Buckets:
    • Globally unique names
    • Specified geographic locations for minimized latency (e.g., European region for European users)
  • Immutable Objects: New versions are created for modifications, preserving history.
  • Versioning (Optional): Tracks all object changes, allowing restoration to previous states.
  • Access Control:
    • IAM roles and Access Control Lists (ACLs) ensure data security and privacy.
    • Grant least privilege access based on user needs.
  • Lifecycle Management: Define policies to automatically manage object lifecycles (deletion, archiving).
    • Examples:
      • Delete objects older than a year.
      • Delete objects before a specific date.
      • Keep only the most recent versions (with versioning enabled).
  • Cost Optimization: Lifecycle management helps control storage costs by removing unnecessary data.

In summary, Cloud Storage offers a robust, scalable, and secure object storage solution for various data management needs. It provides granular control over access, versioning, and lifecycle management, ensuring cost-efficiency and data integrity.

Cloud Storage: Storage Classes and Data Management

Cloud Storage offers a variety of storage classes to cater to different data access needs and budget constraints. Here's a breakdown of the four primary classes:

1. Standard Storage:

  • Ideal for frequently accessed ("hot") data or data with short retention periods.
  • Offers high performance and low latency for retrieval.

2. Nearline Storage:

  • Cost-effective option for infrequently accessed data (read/modify on average once a month or less).
  • Examples: data backups, long-tail multimedia, data archiving.

3. Coldline Storage:

  • Low-cost storage for rarely accessed data (read/modify at most once every 90 days).
  • Offers lower access and operation costs compared to Nearline Storage.

4. Archive Storage:

  • Most economical option for long-term data archiving, online backup, and disaster recovery.
  • Ideal for data accessed less than once a year.
  • Lowest access/operation costs, but has a 365-day minimum storage duration.

Common Characteristics Across All Classes:

  • Unlimited Storage: No minimum object size requirement, scales to accommodate any data volume.
  • Global Accessibility: Data is accessible from anywhere worldwide due to geographically distributed locations.
  • Durability and Performance: Low latency, high durability, and consistent experience across all classes.
  • Security: Data encryption at rest and in transit (using HTTPS/TLS) ensures data privacy.
  • Geo-redundancy (Multi/Dual-Region Storage): Data replication across geographically diverse data centers for disaster protection and load balancing.
  • Pay-as-you-go Model: No minimum fees, only charged for used storage and operations.

Data Transfer Options:

  • gcloud storage command (Cloud SDK): Command-line tool for online data transfers.
  • Cloud Console Drag-and-Drop (Chrome browser): Web interface for uploading data.
  • Storage Transfer Service: Efficiently imports large datasets from other cloud providers, Cloud Storage regions, or HTTP(S) endpoints (supports scheduling and batch transfers).
  • Transfer Appliance: High-capacity storage server leased from Google Cloud for offline data transfer (up to 1 petabyte per appliance).
  • Integration with Other Services: Import/export capabilities with BigQuery, Cloud SQL, App Engine logs, Firestore backups, and objects used by App Engine and Compute Engine applications.

In summary, Cloud Storage provides a flexible and cost-effective solution for storing data of all types. With a range of storage classes, robust security features, and seamless integration with other Google Cloud services, it caters to diverse storage needs and simplifies data management.


Cloud SQL: Managed Relational Database Service

Cloud SQL is Google Cloud's fully managed relational database service offering for MySQL, PostgreSQL, and SQL Server. It frees you from administrative tasks like:

  • Patching and updates
  • Backup management
  • Replication configuration

Benefits:

  • Reduced Management Overhead: Focus on application development while Google handles database administration.
  • Scalability: Scales up to 128 CPU cores, 864 GB RAM, and 64 TB storage.
  • Automatic Replication: Supports replication from Cloud SQL primary instances, external primary instances, and external MySQL instances.
  • Managed Backups: Securely stores backups with easy restore capabilities. Seven backups are included in the instance cost.
  • Data Encryption: Encrypts data at rest and in transit for enhanced security.
  • Network Firewall: Manages network access to each database instance.
  • Accessibility: Accessible by other Google Cloud services and even external applications through standard drivers.
    • Works with App Engine (Connector/J, MySQLdb)
    • Accessible by Compute Engine instances in the same zone
    • Supports external applications using standard MySQL drivers (SQL Workbench, Toad, etc.)

In summary, Cloud SQL simplifies database management by providing a scalable, secure, and fully managed relational database service. It empowers you to focus on building great applications while Google takes care of the underlying infrastructure.


Cloud Spanner: Scalable, Globally Consistent Relational Database

Cloud Spanner is Google Cloud's premium offering for relational database management. It provides a unique combination of features ideal for mission-critical applications:

  • Fully Managed Service: Google handles database administration tasks, freeing you to focus on application development.
  • Horizontal Scalability: Scales seamlessly to meet growing data demands.
  • Strong Consistency: Ensures all reads across globally distributed data are consistent, offering a single source of truth.
  • SQL Support: Leverages familiar SQL syntax for easy querying and data manipulation.

Key Use Cases:

  • Applications requiring a relational database with SQL capabilities: Joins, secondary indexes.
  • High availability: Minimizes downtime and ensures continuous operation.
  • Global consistency: Guarantees data consistency across geographically distributed deployments.
  • High throughput: Handles massive workloads with tens of thousands of reads/writes per second.

Proven Performance:

  • Cloud Spanner is battle-tested by Google's own large-scale applications, powering critical services that generate billions in revenue.

In summary, Cloud Spanner is a powerful relational database solution built for demanding workloads. Its strong consistency, scalability, and familiar SQL interface make it ideal for applications requiring high availability and real-time data access.


Firestore: Flexible and Scalable NoSQL Database for Mobile, Web, and Server Development

Firestore is Google Cloud's NoSQL database service designed for scalability and flexibility. It caters to mobile, web, and server development needs with features like:

  • Horizontal Scalability: Scales efficiently to handle growing data volumes.
  • Document-based Structure: Stores data in documents with key-value pairs, allowing for complex nested objects and subcollections.
  • Flexible Queries: Supports complex queries with chaining, filtering, sorting, and indexing for efficient data retrieval.
  • Offline Support: Caches data locally for offline access and synchronizes changes seamlessly when online.
  • Real-time Data Synchronization: Updates data across all connected devices automatically.
  • Strong Consistency: Guarantees data consistency across geographically distributed deployments.
  • Atomic Batch Operations: Ensures all operations within a batch succeed or fail together.
  • Real Transaction Support: Provides full transactional capabilities for data integrity.

Pricing:

  • Pay-per-use model based on:
    • Document reads, writes, and deletes
    • Queries (charged as document reads)
    • Data storage
    • Network egress (free in many cases)
  • Free tier includes:
    • 10 GiB of free network egress per month (US regions)
    • Daily quotas: 50,000 reads, 20,000 writes, 20,000 deletes, 1 GB storage

Benefits:

  • Ideal for building flexible applications with complex data structures.
  • Efficient offline capabilities for mobile and web applications.
  • Scalable architecture to handle demanding workloads.
  • Strong consistency and data integrity features for mission-critical applications.
  • Cost-effective with a free tier and pay-per-use model.

In summary, Firestore offers a powerful NoSQL solution for developers building scalable and performant applications with real-time data requirements.


Cloud Bigtable: Scalable NoSQL Database for Big Data

Cloud Bigtable is Google Cloud's high-performance NoSQL database service designed for handling massive datasets with low latency and high throughput. It powers critical Google services like Search, Analytics, Maps, and Gmail.

Ideal Use Cases:

  • Working with large datasets (over 1TB) of semi-structured or structured data.
  • Handling fast-changing data with high throughput requirements.
  • Utilizing NoSQL data models where strong relational consistency isn't essential.
  • Managing time-series data or data with natural ordering.
  • Performing big data analytics, including batch or real-time processing.
  • Running machine learning algorithms on large datasets.

Integration and Data Access:

  • Interacts with other Google Cloud services and third-party clients.
  • Supports data access through APIs and various client interfaces:
    • Managed VMs
    • HBase REST Server
    • Java Server using HBase client (data serving for applications, dashboards, and data services)
  • Enables data streaming using popular frameworks:
    • Dataflow Streaming
    • Spark Streaming
    • Storm
  • Supports batch data processing with:
    • Hadoop MapReduce
    • Dataflow
    • Spark
  • Data can be streamed in, processed, and written back to Bigtable or exported to downstream databases.

In summary, Cloud Bigtable is a powerful solution for big data workloads requiring high scalability, performance, and real-time capabilities. Its flexible data model and integration with various tools make it suitable for diverse big data analytics and application development needs.


Choosing the Right Google Cloud Storage Service

Selecting the optimal Google Cloud storage solution depends on your specific data needs and application requirements. Here's a breakdown to guide your decision:

Cloud Storage:

  • Ideal for: Storing large, immutable objects (images, videos) over 10 MB.
  • Features: Petabyte-scale capacity, maximum object size of 5 TB.

Cloud SQL or Cloud Spanner:

  • Ideal for: Online Transaction Processing (OLTP) systems requiring full SQL support.
  • Cloud SQL: Up to 64 TB storage (depending on machine type), suitable for web frameworks and existing applications (user credentials, orders).
  • Cloud Spanner: Petabyte-scale storage, ideal for horizontally scalable deployments beyond Cloud SQL's capabilities.

Firestore:

  • Ideal for: Massive scaling, real-time query results, offline query support.
  • Features: Terabyte-scale capacity, maximum entity size of 1 MB.
  • Use cases: Mobile and web app data storage, synchronization, and querying.

Cloud Bigtable:

  • Ideal for: Storing large volumes of structured data (doesn't support SQL or multi-row transactions).
  • Features: Petabyte-scale capacity, maximum cell size of 10 MB, maximum row size of 100 MB.
  • Use cases: Analytical data with high read/write activity (AdTech, finance, IoT).

Additional Considerations:

  • You might combine multiple services for complex workflows.
  • BigQuery (not covered here) bridges data storage and processing, offering big data analysis and interactive querying capabilities.

In essence, Google Cloud provides a diverse storage portfolio to cater to various data management needs. By understanding the strengths of each service, you can make informed decisions to optimize storage efficiency and application performance.