-
Notifications
You must be signed in to change notification settings - Fork 16
/
dynamic-storage.html.md.erb
337 lines (255 loc) · 14.5 KB
/
dynamic-storage.html.md.erb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
---
title: S3 Dynamic Storage
owner: Services
---
<strong><%= modified_date %></strong>
## <a id='overview'></a> Overview
Dynamic Storage, the Cloud Storage of Swisscom, is a pay-per-use object storage optimized for storing large data sets for your applications (e.g. backups, archives, pictures etc.). You can access your stored data through a RESTful API.
Swisscom Dynamic Storage supports the S3 API:
* [Amazon S3 API](https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html)
Amazon S3 API:
* Very well known on the market
* State of the art, de facto standard
* Provides a big community
* Has hundreds of scripts and tools available
## <a id='integrating-your-service'></a> Integrating the Service With Your App
After the [creation](../devguide/services/managing-services.html#create) of the service and the [binding](../devguide/services/application-binding.html#bind) of the service to an application, the environment variable [VCAP_SERVICES](../devguide/deploy-apps/environment-variable.html#VCAP-SERVICES) is created. The information about the credentials is stored in this variable:
```json
{
"dynstrg": [
{
"credentials": {
"accessHost": "ds11s3.swisscom.com",
"namespaceHost": "<Namespace-ID>.ds11s3ns.swisscom.com",
"accessKeyID": "<Namespace-AccountName>",
"secretAccessKey": "40-Characters"
},
"label": "dynstrg",
"name": "testecs",
"plan": "usage",
"tags": []
}
]
}
```
## <a id='sample-application'></a> Sample Application
Swisscom: [Dynamic Storage How-To](https://github.com/swisscom/dynstrg-howto)
## <a id='delete'></a> Deleting a Service Instance
Before deleting a Dynamic Storage service instance, you have to remove all its buckets and objects. Otherwise, the deletion will fail.
To do so, you can use the [S3cmd tool](http://s3tools.org/s3cmd) and run the following command for each bucket:
```shell
$ s3cmd del --recursive s3://bucket-name
```
In case you don't have any credentials for your Dynamic Storage instance, you can get them by creating a [service key](../devguide/services/service-keys.html).
Feel free to use our [bash helper script](https://github.com/swisscom/docs-appcloud-service-offerings/blob/master/scripts/s3cleanup.sh) to remove large numbers of s3 instances/buckets/files.
## <a id='endpoints'></a> ECS endpoints / access points
| Trust Level | URL | Usecase
| ------------------------- | -------------------------------------------|---------------------------------------------------------------------------|
| Public | https://ds11s3.swisscom.com | Customers which would like to connect through internet to our service |
## <a id='How'></a> How to access Dynamic Storage with the Amazon S3 API
| | Amazon S3 | Swisscom S3 Compatible Storage |
| ------------------------- | ------------------------------------- | ----------------------------------------------------|
| Access | Access Key ID | Access Key ID |
| Access Key | Secret Access Key | Secret Access Key |
| Bucket Path Style | http[s]://s3.amazonaws.com/\<bucket\> | https://ds11s3.swisscom.com/\<bucket\> |
| Bucket Virtual Host Style | http[s]://\<bucket\>.s3.amazonaws.com | https://\<bucket\>.ds11s3.swisscom.com |
| Namespace-style URL | - | https://\<namespace\>.ds11s3NS.swisscom.com/\<bucket\> |
<p class='note'>
<strong>Important</strong>: For security reasons, we only support HTTPS (443) and no HTTP (80)!
</p>
## <a id='Naming'></a> Bucket/Object naming conventions
### Bucket Name
Please follow these conventions when naming S3 buckets in ECS:
* Names must be between 3 and 63 characters long.
* Names must be DNS compliant: lowercase alphanumeric characters (`[a-z0-9]`) and hyphens (`-`).
### Object Name
The following rules apply to the naming of ECS S3 objects:
* Cannot be null or an empty string
* Length range is 1-1024 (unicode char)
* No validation on characters!
## <a id='namespace-style-url'></a> Namespace-style URL
In comparison to Amazon S3, we are using namespaces allowing you to create bucket names which are unique only within your namespace. This means, that your bucket names don't have to be globally unique. On the other hand, you need to use namespace-style URL (e.g. https://8bb3085l69c42879b4aga5ce7016ffff.ds11s3NS.swisscom.com/mybucket/myobject) to access your content for static webpages.
## <a id='using'></a> Using Amazon S3 compatible software on Dynamic Storage
Most of the software which has been developed to use Amazon S3 API will work with Dynamic Storage.
The unsupported operations, rarely used, are listed below.
The challenge arises often on the configuration side. Sometimes the Amazon URL and ports may be hard-coded, the software may use the path style or the virtual style.
## <a id='authentication'></a> Authentication
<p class='note'>
<strong>From the <a href="http://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html">AWS S3 docs</a>:</strong>
The Amazon S3 REST API uses a custom HTTP scheme based on a keyed-HMAC (Hash Message Authentication Code) for authentication. To authenticate a request, first you concatenate selected elements of the request to form a string. You then use your AWS secret access key to calculate the HMAC of that string. Informally, we call this process "signing the request," and the output of the HMAC algorithm "the signature" because it simulates the security properties of a real signature. Finally, you add this signature as a parameter of the request by using the syntax shown in this section. When the system receives an authenticated request, it fetches the AWS secret access key that you claim to have. Then it uses it in the same way to compute a signature for the message it received. Finally, it compares the calculated signature against the signature presented by the requester. If the two signatures match, the system concludes that the requester has access to the AWS secret access key and therefore acts with the authority of the principal to whom the key was issued. If the two signatures do not match, the request is dropped and the system responds with an error message.
</p>
When using ECS, the <strong>Access Key ID corresponds to <Namespace-Account></strong> and the <strong>Secret Access Key corresponds to the Shared Secret Access Key</strong>.
## <a id='s3-api'></a> The Amazon S3 API operations
### <a id='supported'></a> The following operations are supported
#### Signatures
* Signature v2
* Signature v4
#### Bucket operations
* GET Bucket
* GET Bucket ACL
* GET Bucket Location
* GET Bucket Versioning
* GET Service
* GET Bucket lifecycle
* GET Bucket policy
* List Multipart Uploads
* PUT Bucket
* PUT Bucket ACL
* PUT Bucket lifecycle
* PUT Bucket versioning
* PUT Bucket policy
* DELETE Bucket
* DELETE Bucket lifecycle
* DELETE Bucket policy
#### Object operations
* DELETE Object
* DELETE Multiple Objects
* GET Object
* GET Object ACL
* GET Object versions
* HEAD Object
* POST Object
* POST Object restore
* PUT Object
* PUT Object ACL
* Multipart Upload: Initiate, Upload, Complete, Abort, List parts
#### More details
To get more information about the underlying EMC ECS product, please consult their [REST API documentation](http://doc.isilon.com/ECS/3.1/API/index.html).
### <a id='unsupported'></a> The following operations are unsupported: (these functions are used very rarely)
#### Bucket operations
* DELETE Bucket tagging
* DELETE Bucket website
* GET Bucket notification
* GET Bucket logging
* GET Bucket requestPayment
* GET Bucket website
* PUT Bucket logging
* PUT Bucket notification
* PUT Bucket requestPayment
* PUT Bucket tagging
* PUT Bucket website
#### Object operations
* GET Object torrent
Using an S3 compatible URL, Dynamic Storage can easily be used as a backend.
## Best Practices
### App Design
* Don’t require low latency
* Don’t execute serial workflows. Process many operations in parallel.
* Be prepared to handle failures
* Most SDKs have built-in retry mechanisms to handle this for you
* Avoid listing buckets during normal workflows
* S3 API can only list serially and only returns 1000 results per page. This means 1000 API calls per 1 million objects.
* Best suited for periodic cleanup/consistency tasks
### Namespaces, Buckets and Objects
* Bucket names must be unique within a namespace
* For best performance, we recommend to have less than 1000 buckets in a namespace (AWS limits to 100, the standard S3 API cannot paginate)
* Namespace and Bucket names should be DNS compatible
* They can appear in a DNS record if using virtually-hosted buckets
* In particular, no underscores, no capital letters & not accessible through DNS
* The amount of objects is not a problem
* No need to enforce limits like in a traditional FS
* ECS is designed to handle trillions of objects
* Ideally, the object size should be >100 KB
* The ECS internal buffer size is 2 MB, you will see better performance if you align your object size to that
* S3 has no support for directory structures
* Uses a flat structure
* Some drivers and apps emulate a directory structure (prefix+delimiter)
* Use metadata search (i.e. date range) for listing objects
* Be prepared to paginate bucket listings
* Align the range to your app and request only what is needed
* Marker, NextMarker, MaxKeys
### Large Objects
* Use multipart uploads for large objects (>100 MB)
* Allows you to pause and resume uploads
* Improves throughput and parallel uploads
* For size < 1 GB, use multiple of 2 MB (e.g. 8 MB)
* For size > 1 GB, use exactly 128 MB part size (to fill a full chunk)
* Use Byte Range Reads for parallel downloads of large objects
* In Java, use the TransferManager
* In .NET, use TransferUtility
### Traffic
* You can bypass your application server with pre-signed URLs
* Authentication encoded into the URL itself, executed by client
* Users will download and upload object directly from/to ECS as "you"
* Object update frequency should be low
* Object storage platforms are not designed for transactional workloads and performance impact
* Ideally, WORM content (e.g. sensor data, images, videos)
* Avoid HEAD + PUT for conditional overwrite
* Understand your reasons for doing so; it’s generally not necessary
* If you must, use PUT with If-None-Match / If-Match instead.
* Use `If-None-Match: *` to conditionally create if object doesn’t exist.
* Use `If-Match: {Etag}` to conditionally overwrite an object if Etag matches.
* Use the object copy operation
* Instead of downloading and uploading the object again
* Beware concurrent write requests for the same object
* 200 OK means transaction committed
* Order not guaranteed
* If order needs to be guaranteed, use Conditional PUTs (ECS extension)
* Consider using client-side compression for slow links and compressible content
* ECS Java SDK supports ZIP and LZMA
### ECS Extensions
* Metadata Search
* Offload some searching from your database
* Support for "\<,\>,\<=,\>=,=,!=, AND/OR"
* Enabled at bucket level at creation time
* Byte Range uploads
* Update objects partially
* AWS objects are immutable
* Atomic append
* Useful for multi-client streams (e.g. syslog, sensor data)
### Retention and Expiration
* Retention means you cannot update, overwrite, or delete the object until the retention period ends
* There are two ways to assign retention:
* At the bucket level (compatible with generic S3)
* Explicit retention period at an object level
* Expiration will automatically delete the object when it expires
* In S3 this is set using "Bucket Lifecycle".
* Retention overrides Expiration
* An object will never get expired until its retention ends
* Retention can be redefined at any time
### Security
* Grant as few permissions as possible
* e.g. no need to grant read-write permissions if your app only needs to read data
* Use client-side encryption for maximum protection
* But keep your master key secure
* If you lose your master key, you lose your data
* Pre-signed URLs will not work unless the receiver knows your encryption keys.
### Data Management
* Use separate buckets for your environments
* e.g. dev, test, staging, production
* Easier for developers and DevOps to manage
### Implementation
* Use a SDK for your programming language
* No need to reinvent the wheel
* Use ECS Object Client if you want to take advantage of ECS features
* Currently available for Java only
* Use AWS SDK if you want to maintain compatibility with AWS
### CORS
In order to access your bucket objects directly from an SPA (without having your backend in between), you can specify CORS policies per bucket by using the S3 API, for example https://docs.aws.amazon.com/AmazonS3/latest/dev/cors.html.
There are also S3 GUI clients that support setting CORS.
<strong>Be aware that any access and secret keys that are included in the SPA end up in the user's browser where it can be easily read by any end user.</strong>
After having configured CORS on your bucket, you need to access it using the `namespaceHost`
endpoint found in the service binding credentials (see example above). Also, you need to use path-style in your S3 library.
Example with the AWS S3 JS SDK:
```javascript
AWS.config.update({
accessKeyId: '1bbd085e69c44879b4aea5cedeadf00d-bucketowner',
secretAccessKey: '<redacted>',
s3ForcePathStyle: true
});
var s3Endpoint = new AWS.Endpoint('https://1bbd085e69c44879b4aea5cedeadf00d.ds11s3ns.swisscom.com');
var s3Client = new AWS.S3({endpoint: s3Endpoint});
// Create params for S3.createBucket
var readParams = {
Bucket : 'cors-bucket',
};
// call S3 to list the bucket contents
s3Client.listObjects(readParams, function(err, data) {
if (err) {
console.log("Error", err);
} else {
console.log("Success", data);
}
});
```
Be aware that listing buckets in a browser (which needs CORS) is not supported - also not by AWS s3 - since CORS is always set on bucket level.