Can off-line changes in seperate client-side IndexedDB databases be merged when the browsers get back on-line? - firefox4

IndexedDB in Firefox 4 allows us new potential for making apps for client-side database querying and storage, while both off-line and on-line. This is a very new alternative and very interesting, since my organization is split across many countries with poor on-line web service.
Big picture: “IndexedDB allows web apps to store large amounts of data on your local system (with your explicit permission, of course) for fast offline retrieval at a later time.”
My problem:
I don't understand the following.
How off-line changes in a client-side database, say in Haiti, can then be merged with a central Washington DC database?
Or even, how would 3 separate database changes on 3 clients in Haiti get synced with one another? Perhaps it is impossible?
What are the benefits and limits of such a client-side database?

Your questions are spot on and sum up some of the challenges of data reconciliation. All in all, this is possible but I wouldn't recommend attempting it alone. In the database space, apps like CouchDB are working on this and show that it's somewhat of a Hurculean task.
Merging data across object stores means you're going to need a lot of application logic to pull it off elegantly. For example, what happens when two offline apps update the same row? You'd have a "merge conflict" and this kind of situation is why source control applications like Git are so complex.
To implement this idea, you might take a chip of the Git block and use "event sourcing" as a way to roll through changes. I'm working with a similar concept in IDB and it works quite nicely. Worth noting, I am not trying to merge changes across object stores but rather manage revisions in a single object store, so your task would be considered significantly more complex.

Related

Conversion from WebSQL to IndexedDB

I am currently working on a mobile application for time card submission that works with an already existing accounting application. Needless to say, this application relies heavily on relational databases and that particular reliance translates to the mobile app.
In it's current state, the mobile application uses WebSQL for offline access of tables that are loaded onto the device while the user has internet access. Time Cards are created on the local database and then later uploaded when the user regains internet access. This functionality is the core of the application.
My question is whether a transition to IndexedDB is A.) Feasible and B.) A smart move. Had WebSQL avoided deprecation, this wouldn't be an issue. I am beginning to understand IndexedDB better and how JSON can make it useful for relatively complex data storage, but I can't really wrap my head around whether it can actually replicate the functionality of a relational database.
Based off the requirements of the application, it appears that IndexedDB is not an alternative, but I'm still very new to the concept and open to enlightenment.
So can IndexedDB potentially be an alternative? Can IndexedDB be used to replicate the functionality of a database with multiple related tables with large amounts of data. If so, where can I find information on how to do it. If not, do I have an alternative to the two? (Assuming WebSQL does, in fact, lose support and IndexedDB isn't viable).
On a related note, would IndexedDB speed up the population of the local database? PHP is currently used to populate the database while the user is online and it does take a decent amount of time to fill of table with a hundred or so options. When it gets near a thousand, the application just flat out breaks down (This is an uncommon occurance and the clients are strongly discouraged from using that much data).
Any help on this would be great, I'm very new to programming in general and VERY new to web development.
According to http://www.caniuse.com/indexeddb , the support for indexedDB is rather limited, so I wouldn't jump to it for now. But that will most likely change in the future, when the implementations mature.
Personally, IndexedDB looks strange and complicated, especially when you go beyond simple single-table operations. I have not run any actual tests on it, but since you have to do some things (like join records) manually, you will end up with quite a lot more JS code, which translates to more area for bugs to hide.
So can IndexedDB potentially be an alternative? Can IndexedDB be used
to replicate the functionality of a database with multiple related
tables with large amounts of data. If so, where can I find information
on how to do it. If not, do I have an alternative to the two?
(Assuming WebSQL does, in fact, lose support and IndexedDB isn't
viable).
A quick search brings up http://blog.oharagroup.net/post/16394604653/a-performance-comparison-websql-vs-indexeddb , which shows some patterns for IndexedDB multiple table usage. It also shows some performance comparison, which looks promising for IndexedDB. However, see this answer and take this benchmark with a grain of salt.
On a related note, would IndexedDB speed up the population of the
local database? PHP is currently used to populate the database while
the user is online and it does take a decent amount of time to fill of
table with a hundred or so options. When it gets near a thousand, the
application just flat out breaks down (This is an uncommon occurance
and the clients are strongly discouraged from using that much data).
I am a developer of a similar app for a different industry, and my experience is quite different: even on an older iPhone 3GS, the WebSQL solution runs adequately - we have tested schemas with several thousand records per table with no significant slowdowns. Are you maybe inserting each row in a separate transaction?
Most of our clients are satisfied with the app since it runs on iPads, iPhones, Android tablets and Google Chrome. But one client's security requirements only permit usage of Windows and IE, no alternative browsers or non-Windows mobile devices. That is the only scenario we've seen where WebSQL doesn't cut it. We looked into IndexedDB and native apps, and so far we consider native apps a better option (C# base library could be shared between Xamarin and Windows Phone apps, not to mention C# would be so much more pleasant to code than loose-typed JS callback hell).
I'm a couple of years late, but figured I'd drop in and answer the questions of OP (for both his benefit (possibly) and that of anyone who finds themselves here with the same questions) which were not directly answered already, as well as offer some suggestions!
do I have an alternative to the two? (Assuming WebSQL does, in fact, lose
support and IndexedDB isn't viable).
IndexedDB is the only database that remains on the W3C standards track at this point, and as such, is pretty much the only option as far as native client-side databases go.
So can IndexedDB potentially be an alternative? Can IndexedDB be used to
replicate the functionality of a database with multiple related tables with
large amounts of data.
Well...
IndexedDB is a non-relational document store.
Non-relational: Does not allow for the definition of any relationships between the entries which exist in its object stores (tables). All such relationships must be defined and maintained by the application.
Document store: A repository of documents, which are arbitrarily structured data items.
A relational database, on the other hand, supports both the definition and maintenance of relationships between table entries. The majority of these databases are also row stores, which (as you probably know) are repositories of tuples contained in tables which define their respective structures.
So to answer your question, yes, you can replicate in IndexedDB the functionality provided to you by your relational database. And if any of the data items in the store are related to each other in any way, you'll have to, to some extent.
But considering the client-side database is simply a temporary stop-over for your data, it'd be wise to replicate only the bare minimum to maintain the integrity of your data on there, and just take advantage of the rest of such functionality as it exists in the relational database on the server side once the data is transferred.
If the thought of converting still seems palatable, go for it!
But before you do, there are a couple of things you should know about IndexedDB. The first should be evident given the type of database that it is: it does not natively support SQL. The second is that its API is... unwieldy to say the least.
Given these things, I suggest you check out BakedGoods. With it, placing one or more data items in an IndexedDB database, for example, is as simple as:
bakedGoods.set({
data: [{key: "key1", value: "value1"}, {key: "key2", value: "value2"}],
storageTypes: ["indexedDB"],
function(byStorageTypeStoredItemRangeDataObj, byStorageTypeErrorObj){}
});
Since the replication of some relational database functionality may require complex CRUD operations, you may want to take advantage of BakedGood's support for user-defined storage operation functions.
Just for the sake of complete transparency, BakedGoods is maintained by this guy right here :) .
Generally developers who are working with SQL have difficulty in using indexeddb due to its complex apis.
The solution is to use any indexedb library which makes indexedb super easy, but again in order to use the library i need to know few concept of indexeddb.
JsStore is an indexeddb library which removes indexeddb complexity and makes the use of indexeddb super easy. It provides Sql like apis which makes it easy to learn.
Lets say - you have sql query : select * from table_name where id=1 and name='abc'
In JsStore - the query will be :
var con = new JsStore.Instance(db_name);
con.select({
From:table_name,
Where: {
Id: 1,
Name:'abc'
}
}).then(function(result){
console.log(result)
})

Updatable offline storage / database

Currently I'm trying to learn nativescript and for this I thought about doing an App like 'Anki'
But while thinking about the data storage I stumpled upon the problem on how to save my flash cards locally for keeping the app offline (for example with SQLite), save the users time when to reflect each card (e.g. to show again in 10 minutes or 1 day) AND have an update functionality to update the database with new cards without deleting the users data.
What's the best way to solve that problem, especially when I want to provide the updates with an App-Update and without fetching everything from an external database?
I don't have any code yet, therefore a recommendation on how to solve that would be nice.
There is several methods in NativeScript you can use:
NativeScript-Sqlite (disclaimer: I'm the author)
This allows full access to Sqlite for saving and loading items; you can have as big of databases as you need and Sqlite is very fast. Sqlite's biggest drawback is speed of writes; if you have a LOT of writing it can be slower than just writing to a file yourself.
NativeScript-LocalStorage (disclaimer again: I'm the author)
This is more geared to smaller data sizes; as when the app starts and saves it has to load the entire json backed data store into memory. This is really fast over all; but not something you want to use for 10's of thousands of records.
NativeScript-Couchbase
This uses sqlite for local storage and can use couchbase for the remote storage; very nice for having syncable storage - couchbase can be your own server or a leased or rented server.
NativeScript-Firebase
This is also very useful for having syncable storage; however Google charges for FireBase at a certain point.
Built in AppSettings.
This is really designed for a few application settings, not designed for lots of data. But useful for the smaller amounts of data.
Role your own to the file system.
I have done this in a couple of my projects; basically a hybrid between my localstorage plugin and a mini-sql type system. One project was very much write dependent so it made more sense to generate the 20 or so separate files on the phone for each table because I could save them much quicker than inserting/replacing > 100,000 records each time the app started up into sqlite. Had minimal searching needs.
Your storage really needs to be dependent upon what you are doing; it is a balancing act. Lots of searchable data; sqlite wins in almost all cases. Lots of frequent writing; something you create might be a lot faster.

Real-time data warehouse: Event-driven vs Polling

I know this question has been asked before at PostgreSQL to Data-Warehouse: Best approach for near-real-time ETL / extraction of data.
But I want to rephrase this question.
I am attempting a real-time data warehouse. The difference between real-time and near real-time is huge.
I find real-time data warehouse to be event-driven and transactional in approach.
While near real-time would do the same batch mode application but would poll data more frequently.
It would put so much extra load on the production server and would certainly kill the production system.
Being a batch approach it would scan through all the tables for changes and would take rows which have changed from
a cut-off time stamp.
I mean by event driven, it would be specific to tables which have undergone changes are focus only on transaction
which are happening currently.
But the source system is an elephant of system, SAP, assuming which has 25,000 tables. It is not easy to model that,
not easy to write database triggers on each table to capture each change. I want impact on the production server to be minimal.
Is there any trigger at database level so that I could capture all changes happening in database in one trigger.
Is there any way to write that database trigger on a different database server so that production server goes untouched.
I have not been keeping pace with changes happening to database technology and am sure some nice new technologies would have come by to capture these changes easily.
I know of Log miners and Change data captures but it would be difficult to filter out the information which I need from redo logs.
Alternate ways to capture database write operations on the go.
Just for completeness sake let us assume databases are a heterogeneous mix of Oracle, SQL Server and DB2. But my contention is
the concepts we want to develop.
This is a universal problem, every company is looking for easy to implement solution. So a good discussion would benefit all.
Don't ever try to access SAP directly. Use the APIs of SAP Data Services (http://help.sap.com/bods). Look for the words "Integrator Guide" on that page for documentation.
This document should give you a good hint about where to look for your data sources (http://wiki.scn.sap.com/wiki/display/EIM/What+Extractors+to+use). Extractors are kind-of-somewhat like views in a DBMS, they're abstracting all the SAP stuff into somethin human readable.
As far as near-real-time, think in terms of micro-batches. Run your extract jobs every 5 (?) minutes, or longer if necessary.
Check the Lambda Architecture from Nathan Marz (I provide no link, but you'll find the resources easily). The implementation is all Java and No SQl, but the main ideas are applicable to the classical relational databases as well. In the nutshell: you have two implementations, one real time but responsible for only limited time interval. The "long tail" is maintained with classical best practice batch implementation.
The real time part is always discarded after the batch refresh, effectively blocking the propagation of the problems of the real time processing in the historical data.
I just edited the title to "Real-time data warehouses: Event-driven vs Polling".
As Ron pointed out NEVER TOUCH SAP TABLE DIRECTLY. There are adapters and adapters to access SAP tables. This will build another layer in between but it is unavoidable. One good news I want to share is a customer did a study of SAP tables and found that only 14% of the tables are actually populated or touched by SAP system. Even then 14% of 25,000 tables is coming to huge data model of 2000+ entities. Again micro-batches are like dividing the system into Purchase, Receivables, Payables etc., which is heading for a data mart and not an EDW. I want to focus on a Enterprise Data Warehouse.
thanks a lot
As of now I can see only two solutions:
Write services on the source systems. If source is COBOL, put those in services. Put all services in a service bus and
some how trap when changes happen to database. This needs to be explored how that trap will work.
But from outset it appears to be a very
expensive proposition and uncertain. Convincing management for a three year lag time would be difficult. Services are not easy.
Log Shippers: This a trusted database solution. Logs would be available on another server, production server need not be
burdened. There are good number of tools as well available.
But the spirit does not match. Event driven is missing so the action when things
are happening is not captured. I will settle down for this.
Can you suggest any other solution, I would be waiting.

Best practice for on/off line data synchronization using AngularJS and Symfony 2

I'm building a relatively complex and data heavy web application in AngularJS. I'm planning to use php as a RESTful backend (with symfony2 and FOSRESTbundle). I have spent weeks looking around for different solutions to on/off line synchronization solutions and there seem to be many half solutions (see list below for some examples). But non of them seem to fit my situation perfectly. How do I go about deciding which strategy will suite me?
What issues that might determine “best practices” for building an on/off line synchronization system in AngularJS and symfony 2 needs some research, but on the top of my head I want to consider things like speed, ease of implementation, future proof (lasting solution), extensibility, resource usage/requirements on the client side, having multiple offline users editing the same data, how much and what type of data to store.
Some of my requirements that I'm presently aware of are:
The users will be offline often and then needs to synchronize (locally created) data with the database
Multiple users share some of the editable data (potential merging issues needs to be considered).
User's might be logged in from multiple devices at the same time.
Allowing large amount of data to be stored offline(up to a gigabyte)
I probably want the user to be able to decide what he wants to store locally.
Even if the user is online I probably want the user to be able to choose whether he uses all (backend) data or only what's available locally.
Some potential example solutions
PouchDB - Interesting strategies for synchronizing changes from multiple sources
Racer - Node lib for realtime sync, build on ShareJS
Meteor - DDP and strategies for sync
ShareJS - Node.js operational transformation, inspired by Google Wave
Restangular - Alternative to $resource
EmberData - EmberJS’s ORM-like data persistence library
ServiceWorker
IndexedDB Polyfill - Polyfill IndexedDB with browsers that support WebSQL (Safari)
BreezeJS
JayData
Loopback’s ORM
ActiveRecord
BackBone Models
lawnchair - Lightweight client-side DB lib from Brian Leroux
TogetherJS - Mozilla Labs’ multi-client state sync/collaboration lib.
localForage - Mozilla’s DOMStorage improvement library.
Orbit.js - Content synchronization library
(https://docs.google.com/document/d/1DMacL7iwjSMPP0ytZfugpU4v0PWUK0BT6lhyaVEmlBQ/edit#heading=h.864mpiz510wz)
Any help would be much appreciated :)
You seem to want a lot of stuff, the sync stuff is hard... I have a solution to some of this stuff in an OSS library I am developing. The idea is that it does versioning of local data, so you can figure out what has changed and therefore do meaningful sync, which also includes conflict resolution etc. This is sort-of the offline meteor as it is really tuned to offline use (for the London Underground where we have no mobile data signals).
I have also developed an eco system around it which includes a connection manager and server. The main project is at https://github.com/forbesmyester/SyncIt and is very well documented and tested. The test app for the ecosystem will be at https://github.com/forbesmyester/SyncItTodoMvc but I have yet to write virtually any docs for it.
It is currently using LocalStorage but will be easy to move to localForage as it actually is using a wrapper around localStorage to make it an async API... Another one for the list maybe?
To work offline with your requeriments I suggest to divide problem into two scenarios: content (html, js, css) and data (API REST).
The content
Will be stored offline by appcache for small apps or for advanced cases with the awesome serviceworkers. Chrome 40+.
The data
Require solve the storage and synchronization and It becames a more difficult problem.
I suggest a deep reading of the Differential Synchronization algorimth, and take next tips in consideration:
Frontend
Store the resource and shadow (using for example url as key) into the localstorage for small apps or into more advanced alternatives (pouchdb,indexdb,...). With the resource you could work offline and when needs synchronize with the server use jsonpath to get diffs between the resource-shadow and to send it to server the PATCH request.
Backend
At backend take in consideration storage the shadow copies into redis.
The two sides (Frontend/Backend) needs to identify the client node, to do so you could use x- syn-token at HTTP header (send it in all request of the client with angular interceptors).
https://www.firebase.com/
it's reliable and proven, and can be used as a backend and sync library for what you're after. but, it costs, and requires some integration coding.
https://goinstant.com/ is also a good hosted option.
In some of my apps, I prefer to have both: syncing db source AND another main database. (mogno/express, php/mysql, etc..). then each db handles what's its best with, and it's features (real-time vs. security, etc...). This is true regardless to sync-db provider (be it Racer or Firebase or GoInstant ...)
The app I am developing has many of the same requirements and is being built in AngularJS. In terms of future proofing, there are two main concerns that I have found, one is hacking attempts requiring encryption and possible use of one time keys and an backend key manager and the other is support for WebSQL being dropped by the standards consortium in preference to indesedDB. So finding an abstraction layer that can support both is important. The solution set I have come up with is fairly straight forward. Where offline data is is loaded first into the UI and a request goes out to the REST Server if in an online state. As for resolving data conflicts in a multi user environment, that becomes a business rule decision. My decision was to simplify the matter and not delve into data mergers but to use a microtime stamp comparison to determine which version should be kept and pushed out to clients. When in offline mode, store data as a dirty write and the push to server when returning to an online state.
Or use ydn-db, which I am evaluating now as it has built in support for AWS and Google cloud storage built in.
Another suggestion:
Yjs leverages an OT-like algorithm to share a wide range of supported data types, and you have the option to store the shared data in IndexedDB (so it is available for offline editing).

Mobile/Desktop - What Strategy Makes Sense

My boss has big dreams.
He wants to write an application that runs on both the desktop and mobile devices. In addition, he wants it to be occasionally connected (can run without an internet connection). The application will rely heavily on data from a database.
Everyone he talks to keeps pushing HTML5/JavaScript on him for a write-once run-everywhere(ish) solution.
I don't have a lot of experience with this sort of environment--getting data from a database using JavaScript, ORMs for JavaScript and that kind of thing. I may be getting ahead of myself.
What are the kinds of things I should be looking at when trying to wade through a strategy to come as close to his goals as I can? Here are the assumptions and questions I have:
Expectations/Assumptions
I expect that I'll have to use one of the "embedded" or local databases that seem to have sprung up with HTML5 and Local Storage.
I expect that I'll also have to find some way to sync this data with data that's sitting out on a server somewhere.
I expect that the synchronization of this data will have to be homebrewed.
I would like to have some sort of ORM to make working with the data easier.
I expect to run into all sorts of weird things related to the size of the local database.
I expect to have to run all of the application's code on the client-side, since they are supposed to be able to run the application without an internet connection.
Question
What am I doing?
I'm kind of at a loss for even knowing where to start.
To turn this into something that has a chance of having right/wrong answers, here are the things that would be helpful to know:
Does the HTML5/JavaScript approach sound like a good way to go (considering the targets of occasionally connected, mobile, and desktop)?
What sort of frameworks and tools should I be looking at to make the development of the application easiest?
Is he asking for too much?
Thanks in advance for any advice/guidance you might have.
By Request: What does the application do?
The application is (more-or-less) a quoting/pricing application for a configurable product. There are a bunch of products (base cabinets, wall cabinets, etc.), a bunch of standard configurable options (wood, finish, door style), and a bunch of (less standard) modifications to them (reduce depth, increase height, etc.).
Depending on the standard configurable options you choose, it changes the base price of each product. You can then add modifications to them (which also come at a price).
The majority of the application exists already (albeit as a WPF application without locally stored data). It was designed so that it could be marketed to different manufacturers who make these configurable items (primarily kitchen cabinets and the like). Every manufacturer has their own rules about what woods/finishes/etc they offer and how they determine the base price of products (which also vary) and how you can mix/match the different woods/finishes etc.
Blah, blah, blah, every manufacturer is very unique.
To solve this problem we created a formula based approach where once you've set up their products/options/etc you can write some formulas to define not only the relationship between them but also how to price them.
In our current model, the application runs on the user's PC and the data is on a web server that the application makes calls to. He wants to turn this whole thing into an occasionally connected, mobile application that we can use on desktops as well.
There is quite a lot of data associated with it, since any manufacturer's data will contain images, descriptions, notes, thousands of products/modifications and lots of information about them (width, height, depth, number of doors, etc).
Does the HTML5/JavaScript approach sound like a good way to go (considering the targets of occasionally connected, mobile, and desktop)?
Yes. JavaScript is probably the way to go on this, however it won't be easy if you're not already JavaScript savvy. Large applications are a beast in JavaScript, especially on Mobile devices.
I know little about client-side database storage, but keeping the server and client databases in synch will almost certainly require AJAX, and XML or JSON transformations.
Consider security and size of the data on the client (should the client have access to all the data stored on their machine?).
What sort of frameworks and tools should I be looking at to make the development of the application easiest?
I use jQuery for all DOM manipulation, event hooks, and AJAX. Plus I use many other features/plugins for other thinks. I highly recommend taking a look at it.
Firebug (<--must have)
Is he asking for too much?
The connectionless aspect may be too much. I wouldn't be surprised if it doubled the coding time.
You may want to provide some more information on what the application does. If it's a huge UI heavy CMS this project could take years for a single person. However if it's just a little Nerd Dinner-like app, it shouldn't be too bad.
Edit after question update
I would test the client-side database approach with a mobile device first. You may run into unforeseen limitations (data transmission speed, data size) with the environments (Android browser, Mobile Safari). The when and what to update when you have an internet connection to work with, is also a big determining factor in level-of-effort. These questions may get informed by testing the client-side database limitations.
The rest seems fairly straight-forward to me. Good luck. =)

Resources