Hi, this is Chris from Cookpad's Media Product Globalization department.
I'm going to discuss some pitfalls we've run into while working with AWS AppSync for iOS. This post is not a primer on AppSync, nor is it a general review of whether you should or should not use AppSync for your project. My goal is to point out some various lessons we've learned so far that weren't obvious at first. My second disclaimer is that AppSync itself is under active development, so you can probably expect that some of the points I cover in this post will be altered in the future.
Background
My team has been working on a standalone iOS app for shooting, editing, and sharing 1-minute, top-down recipe videos called Cookpad Studio (here's a completed example video). At the time of this posting, our app is still in closed beta.
The shooting and editing parts are local to an iOS device.
But the sharing to the community part relies on a server backend to share data between users.
For the community part of the app, we decided to use AWS AppSync and various AWS backend technologies as an alternative to more established frameworks like Ruby on Rails.
Our AppSync setup is a bit different than the standard use case. AppSync is designed to be configured by app developers through the Amplify CLI. Since our team has dedicated backend engineers, we've opted to do most configuration and server development through the AWS component services directly (e.g. AWS Lambda, DynamoDB, etc.).
SDKs
AppSync on the iOS side is an amalgamation of a few different AWS SDKs. Luckily, all of them are open source and you can dive into their code when necessary. The three SDKs we're using so far are:
- Authentication - The SDK that facilitates user authentication via Cognito.
- Storage - The SDK that facilitates file uploads/downloads to/from S3.
- API - The GraphQL client that facilitates fetching and mutating records in DynamoDB.
The first thing to understand about these SDKs is that they're all very different. They were written at different times by different teams with different technologies and have evolved with different goals in mind.
To give you an idea of what I mean by different, here's some various specs about each SDK:
- Authentication
- Objective-C & some Swift wrappers
- Uses AWSTask, a fork of Facebook's Bolts Framework, for async communication, alongside Cocoa conventions (e.g. delegates, closures, GCD).
- Storage
- Objective-C
- Uses AWSTask alongside Cocoa conventions.
- API
- Swift
- Uses a custom Promise implementation for async communication, alongside Cocoa conventions.
- Uses
.graphqlconfig.yml
for additional GraphQL configuration.
Authentication SDK
Singletons
I generally prefer to use initializer-based dependency injection over singletons. This is often unavoidable, even when only using Apple's first-party SDKs.
I was pleased to find that code completion gave me a couple different initialization options for AWSMobileClient
, the primary class for interfacing with the Cognito authentication APIs. The most complete of the initializers being:
- (instancetype)initWithRegionType:(AWSRegionType)regionType identityPoolId:(NSString *)identityPoolId unauthRoleArn:(nullable NSString *)unauthRoleArn authRoleArn:(nullable NSString *)authRoleArn identityProviderManager:(nullable id<AWSIdentityProviderManager>)identityProviderManager;
I went down this path, discovering later that using this initializer leaves the AWSMobileClient
instance in a very broken state.
AWSMobileClient
is actually a Swift wrapper and subclass of the Objective-C _AWSMobileClient
class. Inside you'll find some code that certainly stretches my understanding of subclassing rules across Swift and Objective-C:
public class AWSMobileClient: _AWSMobileClient { static var _sharedInstance: AWSMobileClient = AWSMobileClient(setDelegate: true) @objc override public class func sharedInstance() -> AWSMobileClient { return _sharedInstance } @objc public var isSignedIn: Bool { get { if (operateInLegacyMode) { return _AWSMobileClient.sharedInstance().isLoggedIn } else { return self.cachedLoginsMap.count > 0 } } } // ... }
Additionally, the initialize
method that must be called by the client references itself and several other singletons:
_AWSMobileClient.sharedInstance()
DeviceOperations.sharedInstance
AWSInfo.default()
- reads fromawsconfiguration.json
in the bundle.AWSCognitoAuth.registerCognitoAuth(...)
Takeaway: For this SDK and the other AWS SDKs, you have to use the singletons.
Keychain credentials
The Authentication SDK uses the keychain APIs to store user credentials securely.
We changed server environments a few times during development. First, we had a prototype environment, then changed to a more long-term development environment, and finally to a production development in parallel with the development environment. By environment, I mean the keys used to locate our apps resources (e.g. PoolId
, Arn
, ApiUrl
, ApiKey
, etc.).
A few of our team members had installed and ran a release build of the app in the prototype environment at some point, thereby storing some Cognito tokens in their keychain. When we switched to the development environment, we started seeing deadlocks during our authentication bootstrapping process. The bootstrapping process happens on a cold launch and runs the required asynchronous AWSMobileClient
initialization methods.
The debugging steps of deleting the app and reinstalling did not work because the keychain contents are retained by iOS across app installs for the same bundle ID.
Once we had determined that AWSMobileClient
could not handle loading "bad" environment user credentials – user credentials created with a different AWS configuration parameters – I had to create special builds for these devices that called AWSMobileClient.sharedInstance().signOut()
immediately on launch.
We actually saw a similar deadlock in AWSMobileClient
when running the app on the iOS simulator during development, which threw me off the trail a bit during debugging.
Takeaway: Be careful when changing environment configuration parameters.
Drop in Authentication UI
The Authentication SDK includes a drop-in UI. Because we wanted to ship our app to beta users as quickly as possible to start gathering feedback, I was particularly pleased that I wouldn't need to write a custom UI for authentication.
Unfortunately, we found a few dealbreakers that prevented us from using the drop-in UI.
First, the drop-in UI has no support for localization. Since our first market is Japan, we definitely needed the UI to support Japanese. The localization issue has appeared in other contexts as well, especially errors returned by the SDK. I would keep this point in mind if the product you're working on requires any other language besides English.
Second, I was planning on presenting the authentication view controller from our root view controller, an instance of UIViewController
. I found that the entry point to the drop-in UI requires a UINavigationController
:
+ (void)presentViewControllerWithNavigationController:(UINavigationController *)navigationController configuration:(nullable AWSAuthUIConfiguration *)configuration completionHandler:(AWSAuthUICompletionHandler)completionHandler;
This seemed like an odd requirement since the drop-in UI view controller seemed to be presented modally. Digging into the code, I came to the same conclusion as this GitHub Issue: the only API used is the UIViewController
presentation API.
There's also this long-running GitHub Issue with feature requests for the drop-in UI.
Takeaway: Using the drop-in UI may not be feasible for your use case.
Is initialize
an asynchronous task?
The signature of AWSMobileClient
's required initialization method is:
public func initialize(_ completionHandler: @escaping (UserState?, Error?) -> Void)
From this signature, I would assume this function is asynchronous, and therefore anything that depends on the result of this call needs to wait until the completionBlock
is called.
However, if we look at the implementation:
internal let initializationQueue = DispatchQueue(label: "awsmobileclient.credentials.fetch") public func initialize(_ completionHandler: @escaping (UserState?, Error?) -> Void) { // Read awsconfiguration.json and set the credentials provider here initializationQueue.sync { // ... full implementation } }
I wasn't sure what to expect when stepping through this code, but it looks like if initialize
is called on the main thread, the implementation within the sync
closure continues to be executed on the main thread. After the completion handler is called within initialize
and that code runs, control flow returns to the end of initialize
.
Takeaway: You can probably assume that AWSMobileClient.sharedInstance().initialize(...)
is synchronous. However, if you're paranoid about the implementation changing at some point, treat it in your calling code as asynchronous.
Storage SDK
Initialization
Similar to our takeaway from the Authentication's section above about singletons, I recommend being extra cautious about the set up of your AWSS3TransferUtility
instance.
Internally, AWSS3TransferUtility
the class maintains a static dictionary of instances and a default instance.
// AWSS3TransferUtility.m static AWSSynchronizedMutableDictionary *_serviceClients = nil; static AWSS3TransferUtility *_defaultS3TransferUtility = nil;
There are some directions in the API docs about how to register an instance with custom configuration options.
However, if you decide to use the default instance like I did, you need to set the service configuration in a different singleton before calling AWSS3TransferUtility.default()
for the first time. (I only learned this by eventually finding my way to the implementation of AWSS3TransferUtility.default()
after struggling for hours with various unauthorized
errors at runtime when trying to perform uploads).
AWSServiceManager.default()!.defaultServiceConfiguration = AWSServiceConfiguration(region: .APNortheast1, credentialsProvider: AWSMobileClient.sharedInstance()) let transferUtility = AWSS3TransferUtility.default()
Takeaway: Register your own AWSS3TransferUtility
. Or if you want to use the default, set an AWSServiceConfiguration
in the AWSServiceManager
singleton before calling AWSS3TransferUtility.default()
for the first time.
AWSTask for upload & download operations
The Storage SDK uses AWSTask throughout. AWSTask
is a fork of Facebook's Bolts Framework.
Tasks... make organization of complex asynchronous code more manageable.
The usage of the primary Storage SDK's APIs for uploading and downloading are shown in the API docs, but since I wanted to ensure all codepaths for errors were handled properly, I had to dig a little deeper to understand how these tasks work under the hood. I'll use multi-part uploading as an example, but this applies to all three scenarios (uploading, multi-part uploading, and downloading).
I've annotated the types so that you can see the identity of what's actually flowing around all these closures.
let expression = AWSS3TransferUtilityMultiPartUploadExpression() expression.progressBlock = { (task: AWSS3TransferUtilityMultiPartUploadTask, progress: Progress) in DispatchQueue.main.async(execute: { // ... }) } let completionHandler: AWSS3TransferUtilityMultiPartUploadCompletionHandlerBlock = { (task: AWSS3TransferUtilityMultiPartUploadTask, error: Error?) -> Void in DispatchQueue.main.async { // ... } } let taskQueuedHandler: (AWSTask<AWSS3TransferUtilityMultiPartUploadTask>) -> Any? = { (task: AWSTask<AWSS3TransferUtilityMultiPartUploadTask>) -> Any? in DispatchQueue.main.async { if let result = task.result { // An `AWSS3TransferUtilityMultiPartUploadTask` was queued successfully. } else if let error = task.error { // The `AWSS3TransferUtilityMultiPartUploadTask` was never created. } else { // Not sure if this code path is even possible. } } return nil } let task: AWSTask<AWSS3TransferUtilityMultiPartUploadTask> = transferUtility.uploadUsingMultiPart(fileURL: fileURL, bucket: bucketName, key: objectKey, contentType: contentType, expression: expression, completionHandler: completionHandler) task.continueWith(block: taskQueuedHandler)
The overloaded use of the identifier Task
in the types caused me some confusion at first. AWSS3TransferUtilityMultiPartUploadTask
is not a subclass or in any way related to AWSTask
as a concept.
Let's start at the bottom. The transferUtility.uploadUsingMultiPart(...)
method takes some parameters, two closures, and returns an AWSTask<AWSS3TransferUtilityMultiPartUploadTask>
: an AWSTask
that will asynchronously return an AWSS3TransferUtilityMultiPartUploadTask?
or an Error?
to the block provided to continueWith
.
The moment of understanding I had was realizing that just creating an AWSS3TransferUtilityMultiPartUploadTask
is an asynchronous, fallible operation, with an error case that must be handled. That is why we've defined taskQueuedHandler
above.
Keep in mind that taskQueuedHandler
may be called on a background queue.
completionHandler
will always get called if the if let result = task.result
code path in taskQueuedHandler
executes. completionHandler
still has to handle both success and failure cases.
If, for example, you start a UIActivityIndicatorView
as loading before calling uploadUsingMultiPart
, but you don't handle the task.continueWith
error, it's possible that the UIActivityIndicatorView
will spin forever.
Takeaway: If you're expecting the result of an upload or download at some point in the future, you need to handle the error case in task.continueWith
.
AWSTask for get{*}Tasks
Since AWSS3TransferUtility
maintains its own database of tasks, even across app cold launches, you may need to retrieve these tasks. This use case is shown in the API docs.
let downloadTasks = transferUtility.getDownloadTasks().result let uploadTasks = transferUtility.getUploadTasks().result let multiPartUploadTasks = transferUtility.getMultiPartUploadTasks().result
Note that even though these getter functions return an AWSTask
, they're not asynchronous and the result
is available immediately. There's also no way for the returned AWSTask
to contain an error.
Takeaway: Sometimes the AWS SDKs return AWSTask
s for synchronous operations. Sometimes they return AWSTask
s for operations that are not fallible. However, be careful relying on this behavior because the underlying implementation could always be changed in a future version without your knowledge.
API SDK
Because AWSAppSyncClient
in built on top of ApolloClient
, some of the below points are applicable to Apollo GraphQL as well.
Offline Mutations
One of the marketing points of AppSync is that mutations (i.e. POST
, PUT
, or DELETE
in the REST world) can be triggered by a user while they're offline, and the mutations will be queued in local storage and relayed to the server when the user's device has connectivity again.
This is a feature set available in certain types of apps, including many of Apple's own stock apps like Reminders or Contacts.
However, this behavior does not always make sense for all types of mutations. Even when it does make sense, it often comes with an additional heavy burden of proper UX design. Handling errors. Handling conflicts. These are problems that even the most mature apps still struggle with.
In our app, we have a pretty straightforward createUser
mutation (i.e. sign up). createUser
is a particularly poor candidate for offline mutation support:
- It has several server-side validation rules for form elements (e.g. unique
username
). - The app is logically partitioned to only allow registered users to access certain parts of the app.
Before learning that offline mutations were the default in AppSync and could not be turned off, I was struggling to understand why when simulating network errors, the completion block to my mutation was never getting called, even beyond the timeout duration.
When I realized this behavior was intentional, it took more time to figure out a workaround that didn't require the huge maintenance burden of subclassing or implementing manual timeout code throughout the app.
It turns out the workaround is as simple as using the underlying appSyncClient.apolloClient
instance.
// Before appSyncClient.perform(mutation: mutation, queue: .main, optimisticUpdate: nil, conflictResolutionBlock: nil) { (result, error) in // ... } // After appSyncClient.apolloClient?.perform(mutation: mutation, queue: .main) { (result, error) in // ... }
From my reading of the
AWSAppSyncClient
source, it's safe to force unwrapapolloClient
at the moment. But certainly use caution in your particular use case.
With the above code, mutations attempted while offline will fail with an error after the default timeout (60 seconds) and call the completion block.
Takeaway: Use appSyncClient
's underlying apolloClient
directly to perform mutations that shouldn't be queued offline.
Errors
Overall, GraphQL is a welcome addition of structure compared to REST. However, I've found the error story to be a little disappointing.
When writing my first AppSync API request handler, I soon found the control flow for errors to be a little overwhelming. All layers of the stack have their own set of errors, and Swift's untyped errors don't help the situation.
Let's look at an example fetch request. I've set up and documented the completion handler.
appSyncClient.fetch(query: query) { (result: GraphQLResult<Query.Data>?, error: Error?) in // 1 if let networkError = error as? AWSAppSyncClientError { // The first layer of error handling is a network stack error. // 2 } else if let unknownError = error { // This case probably shouldn't happen, but I don't know the network stack // well enough to guarantee that. // 3 } else if let data = result?.data? { // This is sort of the happy path. We got the data we requested. // However, `result?.errors?` may still contain errors! // It depends on your use case whether you want to ignore them if // `data` is non-null. // 4 } else if let graphQLErrors = result?.errors?, !graphQLErrors.isEmpty { // According to the GraphQL spec, graphQLErrors will be a non-empty list. // These errors are also more or less untyped. // 5 } else { // Although logically we should have covered all the cases, // the compiler can't statically guarantee we have so we should throw // an `unknown` error from here. } }
- The network stack is provided by
AWSAppSyncHTTPNetworkTransport
and throwsAWSAppSyncClientError
. In the.requestFailed
case, the Cocoa error can be extracted and thelocalizedDescription
shown to the user. The other cases probably aren't that useful. Note that althoughAWSAppSyncClientError
conforms toLocalizedError
, the error messages are English only and usually add various codes that would probably be unideal to show users. - I haven't dug through the network stack enough to know whether there are other error types that can be thrown, but the presence of an error at this level of the stack probably means that
result
will benil
. - The GraphQL spec says that
result
can contain bothdata
anderrors
. It's up to you to determine whether you need to handle this case, and if so, how to handle it. For many use cases though, getting data means success. - The GraphQL spec defines an
error
as a map with amessage
that's intended for developers, and optionallylocations
andpath
fields. As of the June 2018 spec, user fields should be contained within theextensions
field. However, the AppSync spec was based on the October 2016 GraphQL spec, and therefore defines anerrorType
field in the root of theerror
map.errorType
is aString
type which makes it more readable to developers, but also more error prone. - All those nullable fields have left us with an
else
case.
I really wish errors were typed in GraphQL (and Swift too!).
Takeaway: Handling the results of a fetch
or perform
requires some knowledge about the various layers of the network stack. Make sure you've considered the possible errors at each layer and how can you help your user recover from them.
Equatable structs
The codegen utility included in AWS Amplify and part of Apollo's tooling does not support generating structs that conform to Equatable
. Generated enums do conform to Equatable
.
The way structs are laid out, all the struct's data is stored in a dictionary [String: Any?]
(typealiased as Snapshot
). Its typed properties are decoded from or encoded into that dictionary on the fly in a property's getter and setter, respectively.
Equatable
could probably be generated the old fashioned way by comparing all properties. I'm unsure of whether this could introduce performance problems for deeply nested structs due to the lazy (and non-cached) decoding.
This was discussed in a (now closed) GitHub issue.
Takeaway: Code generated enums conform to Equatable
. Code generated structs do not conform to Equatable
. If you need Equatable
structs, you'll have to write the ==
function yourself manually, generate it with a tool like Sourcery, or create wrapper structs.
Query watching
AWSAppSyncClient
has a useful watch
feature that allows you to receive updates to any resources fetched by the query you're watching throughout the lifetime of the watch
. Experimenting with this feature, I've found a few conceptual points to keep in mind.
watch
works by first adding a subscription to any changes to the store. Next, it makes a normal fetch with the same configurable cache policy options available to fetch
. The results of this initial fetch are used to create a list of dependentKeys
. When the cache notifies the GraphQLQueryWatcher
that its contents have changed, the GraphQLQueryWatcher
checks if any of the changed keys are contained in its dependentKeys
, and if so, it fetches the query again (with cache policy .returnCacheDataElseFetch
) then calls the closure registered in watch
with the result.
Set up the cache key identifier on your store
As stated in the docs, you have to tell apolloClient
how you uniquely identify your resources:
// Use something other than "id" if your GraphQL type is different appSyncClient?.apolloClient?.cacheKeyForObject = { $0["id"] }
In their example, it says that a Post
with id = 1
would be cached as Post:1
. However, in my testing, only the id
itself is used (i.e. 1
). Currently, we have id
s that are unique across our resources, but if you don't, you may need to investigate this more to ensure you don't have key collisions in the cache.
A fetch must succeed before watching will work
Since dependentKeys
are derived from the results of the first fetch
(and is regenerated on subsequent fetches), this fetch
has to be successful in order for the watch
to respond to changes produced by other queries.
If you use watch
, you have to allow your user to retry in case the initial fetch fails. Call GraphQLQueryWatcher.refetch()
. Even if the same query is fetched from a different part of your app, this query must succeed at least once in order to receive changes.
Use a pessimistic cache policy
You essentially cannot (safely) use the .returnCacheDataDontFetch
cache policy with watch
.
Granted, it's rare case to want to do so. But if you thought that the partial results from a different query in your app could be picked up by a watch query, this won't work. It has to be the exact same query and it has to have been fetched before with the exact same parameters from the server.
If you used .returnCacheDataDontFetch
as the cache policy and the fetch resulted in a cache miss, you would have to call refetch()
anyway to make a fetch to the server.
It's not straightforward to use watch with paging queries
It's common in GraphQL to use a Connection
type to implement indefinite paging.
Let's look at the following GraphQL schema:
type MovieConnection { movies: [Movie!]! # contains a maximum of 10 items nextToken: String } type Query { getLatestMovies(nextToken: String): MovieConnection! getMovie(id: Int!): Movie! }
For example, if you set up a watch for the first call to getLatestMovies(nextToken: nil)
, this watch will only respond to changes to the 10 Movie
resources returned by the query. If you make a normal fetch
request for the next page using nextToken
, the watch you have set up will not observe changes in the Movie
resources returned in the second request.
If you wanted to respond to changes to any Movie
returned in any pages, you'd have to do a watch
for each page and add the GraphQLQueryWatcher
to a collection. The logic in your result handlers would depend heavily on how you structured your data source since the result could be an add or an update.
It's not possible to watch resources outside a query
It's probably obvious from the interface to watch
since the first parameter is of type GraphQLQuery
, but you cannot watch for changes to an arbitrary key in the cache. For example, if there was a resource in your database keyed by id 12345
, you can't simply register a watcher with a dependent key for this id
.
Any connection between resources and queries must be resolved by the server
If you have two different queries that you know reference the same object, that relationship must be codified by the server.
Continuing with the getLatestMovies
example in the previous section, imagine we received a list of 10 Movie
s and wanted to watch for granular changes in the Movie
with id = 12345
.
To accomplish this you might think you could simply call:
let watcher = appSyncClient.watch(query: GetMovieQuery(id: 12345), cachePolicy: .returnCacheDataDontFetch, queue: .main, resultHandler: { (result, error) in ... }
But this would not work! It would result in a cache miss and the watch would be inert until refetch()
was called.
Although the Movie
returned by GetMovieQuery(id: 12345)
is already in the cache, the association between the query itself and the Movie
resource can't be resolved by AppSync/Apollo until the server returns the result for the query and this result is cached too.
Conclusion
In this post, I outlined some development points to watch out for in the Authentication, Storage, and API SDKs of AWS AppSync. I hope the takeaways from this post are valuable for current and future users of AWS AppSync.