✓ UeberDB turns every database into a simple key value store by providing a layer of abstraction between your software and your database.
✓ UeberDB uses a cache and buffer to make databases faster. Reads are cached and writes are done in a bulk. This can be turned off.
✓ UeberDB does bulk writing ergo reduces the overhead of database transactions.
✓ UeberDB uses a simple and clean syntax ergo getting started is easy.
- Couch
- Dirty
- Elasticsearch
- Maria
memory: An in-memory ephemeral database.- Mongo
- MsSQL
- MySQL
- Postgres (single connection and with connection pool)
- Redis
- Rethink
rustydb- SQLite
- Surrealdb
npm install ueberdb2
const ueberdb = require("ueberdb2");
(async () => {
// mysql
const db = new ueberdb.Database("mysql", {
user: "root",
host: "localhost",
password: "",
database: "store",
engine: "InnoDB",
});
// dirty to file system
//const db = new ueberdb.Database('dirty', {filename: 'var/dirty.db'});
await db.init();
try {
await db.set("valueA", { a: 1, b: 2 });
console.log("valueA is", await db.get("valueA"));
} finally {
await db.close();
}
})();const ueberdb = require("ueberdb2");
(async () => {
const db = new ueberdb.Database("dirty", { filename: "var/dirty.db" });
await db.init();
try {
await Promise.all([
db.set("valueA", { a: 1, b: 2 }),
db.set("valueA:h1", { a: 1, b: 2 }),
db.set("valueA:h2", { a: 3, b: 4 }),
]);
// prints [ 'valueA:h1', 'valueA:h2' ]
console.log(await db.findKeys("valueA:*", null));
} finally {
await db.close();
}
})();findKeys() materialises every matching key into a single array. On very
large keyspaces that loads the whole result set into memory at once — see
ether/etherpad#7830 where a multi-million-row sessionstorage:*
sweep OOMed the host. findKeysPaged() walks the same keyspace in
fixed-size pages using an exclusive after cursor:
const ueberdb = require("ueberdb2");
(async () => {
const db = new ueberdb.Database("mysql", settings);
await db.init();
try {
let after;
let total = 0;
while (true) {
const page = await db.findKeysPaged("sessionstorage:*", null, {
limit: 500,
...(after != null ? { after } : {}),
});
if (page.length === 0) break;
total += page.length;
for (const key of page) {
// ...process key...
}
after = page[page.length - 1];
}
console.log(`processed ${total} keys`);
} finally {
await db.close();
}
})();Semantics:
- Keys are returned in ascending byte-order, up to
limitper call. afteris exclusive — pass the last returned key as the nextaftervalue. Final page is when the returned array is empty.limitmust be a positive integer; non-positive or non-integer values throw.- Native implementations: mysql (ranged
BINARY \key` > ?), **postgres** (key > $n). All other backends fall back tofindKeys() + JS-side slicing` via the cache layer — correct, but defeats the OOM-mitigation purpose. PRs for native paged paths on other backends welcome.
ueberDB can store complex JSON objects. Sometimes you only want to get or set a
specific (sub-)property of the stored object. The .getSub() and .setSub()
methods make this easier.
const value = await db.getSub(key, propertyPath);
db.getSub(key, propertyPath, callback);Fetches the object stored at key, walks the property path given in
propertyPath, and returns the value at that location. propertyPath must be
an array. If propertyPath is an empty array then getSub() is equivalent to
get(). Returns a nullish value (null or undefined) if the record does not
exist or if the given property path does not exist.
Examples:
async () => {
await db.set(key, { prop1: { prop2: ["value"] } });
const val1 = await db.getSub(key, ["prop1", "prop2", "0"]);
console.log("1.", val1); // prints "1. value"
const val2 = await db.getSub(key, ["prop1", "prop2"]);
console.log("2.", val2); // prints "2. [ 'value' ]"
const val3 = await db.getSub(key, ["prop1"]);
console.log("3.", val3); // prints "3. { prop2: [ 'value' ] }"
const val4 = await db.getSub(key, []);
console.log("4.", val4); // prints "4. { prop1: { prop2: [ 'value' ] } }"
const val5 = await db.getSub(key, ["does", "not", "exist"]);
console.log("5.", val5); // prints "5. null" or "5. undefined"
};await db.setSub(key, propertyPath, value);
db.setSub(key, propertyPath, value, callback);Fetches the object stored at key, walks the property path given in
propertyPath, and sets the value at that location to value. propertyPath
must be an array. If propertyPath is an empty array then setSub() is
equivalent to set(). Empty objects are created as needed if the property path
does not exist (including if key does not exist in the database). It is an
error to attempt to set a property on a non-object.
Examples:
// Assumption: db does not yet have any records.
(async () => {
// Equivalent to db.set('key1', 'value'):
await db.setSub('key1', [], 'value');
// Equivalent to db.set('key2', {prop1: {prop2: {0: 'value'}}}):
await db.setSub('key2', ['prop1', 'prop2', '0'], 'value'):
await db.set('key3', {prop1: 'value'});
// Equivalent to db.set('key3', {prop1: 'value', prop2: 'other value'}):
await db.setSub('key3', ['prop2'], 'other value');
// TypeError: Cannot set property "badProp" on non-object "value":
await db.setSub('key3', ['prop1', 'badProp'], 'foo');
});Set the cache wrapper option to 0 to force every read operation to go directly
to the database driver (except for reads of written values that have not yet
been committed to the database):
const ueberdb = require("ueberdb2");
(async () => {
const db = new ueberdb.Database("dirty", { filename: "var/dirty.db" }, { cache: 0 });
await db.init();
try {
await db.set("valueA", { a: 1, b: 2 });
const value = await db.get("valueA");
console.log(JSON.stringify(value));
} finally {
await db.close();
}
})();Set the writeInterval wrapper option to 0 to force writes to go directly to
the database driver:
const ueberdb = require("ueberdb2");
(async () => {
const db = new ueberdb.Database("dirty", { filename: "var/dirty.db" }, { writeInterval: 0 });
await db.init();
try {
await db.set("valueA", { a: 1, b: 2 });
const value = await db.get("valueA");
console.log(JSON.stringify(value));
} finally {
await db.close();
}
})();| Get | Set | findKeys | findKeysPaged | Remove | getSub | setSub | doBulk | CI Coverage | |
|---|---|---|---|---|---|---|---|---|---|
| cassandra | ✓ | ✓ | * | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| couchdb | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| dirty | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| dirty_git | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| elasticsearch | ✓ | ✓ | * | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| maria | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| mysql | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| postgres | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| redis | ✓ | ✓ | * | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| rethinkdb | ✓ | ✓ | * | ✓ | ✓ | ✓ | ✓ | ✓ | |
| rustydb | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| sqlite | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| surrealdb | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
The following characters should be avoided in keys \^$.|?*+()[{ as they will
cause findKeys to fail.
The following have limitations on findKeys
- redis (Only keys of the format *:*:*)
- cassandra (Only keys of the format *:*:*)
- elasticsearch (Only keys of the format *:*:*)
- rethink (Currently doesn't work)
For details on how it works please refer to the wiki: https://github.com/ether/UeberDB/wiki/findKeys-functionality
findKeysPaged is supported on every backend, but only the SQL backends
(mysql, mariadb, postgres) iterate the keyspace with a server-side ranged
query — that's the variant that actually bounds memory for the OOM case it
was added for (ether/etherpad#7830). Other backends share the same
API surface via a wrapper that falls back to findKeys() plus in-JS
slicing; correct, but the underlying findKeys() still materialises every
matching key, so the OOM-mitigation benefit only applies to the SQL
backends. PRs adding native paged paths for the rest are welcome.
To scale UeberDB you should use sharding especially for real time applications. An example of this is sharding given Pads within Etherpad based on their initial pad authors geographical location. High availability and disaster recovery can be provided through replication of your database however YMMV on passing Settings to your database library. Do not be under the illusion that UeberDB provides any Stateless capabilities, it does not. An option is to use something like rethinkdb and set cache to 0 but YMMV.
Your Key Length will be limited by the database you chose to use but keep into account portability within your application.
doBulk operations that chain IE a large number of .set without a pause to handle
the channel clearance can cause a Javascript out of heap memory. It's very
rare this happens and is usually due to a bug in software causing a constant
write to the database.
You should create your database as utf8mb4_bin.
The postgres driver uses a pg connection
pool. The pool-related settings below are part of the Settings type; other
pg pool options are forwarded to the
pool at runtime as well, but aren't in the type yet (so TypeScript callers may
need a cast for those). The defaults applied by ueberDB2 are:
| Setting | Default | Notes |
|---|---|---|
max |
20 |
Maximum connections in the pool. |
min |
4 |
Minimum warm connections kept open (honored by pg >= 8.16). |
idleTimeoutMillis |
1000 |
Idle reaping only applies to connections above min. |
keepAlive |
true |
Enables TCP keep-alive on pooled sockets. |
keepAliveInitialDelayMillis |
10000 |
Delay before the first keep-alive probe (ms). |
Because min connections are kept warm, those sockets can sit idle
indefinitely. Anything between your application and PostgreSQL that drops idle
connections will eventually close them, and the next use then fails with
Connection terminated unexpectedly. There are two distinct flavours of
"idle drop", and they need different handling:
-
Kernel / NAT / firewall / conntrack state expiry — these expire idle TCP flows when no packets at all are seen.
keepAlive(enabled by default, 10s initial delay) fixes this: the OS emits periodic keep-alive probes so the flow never looks dead. LowerkeepAliveInitialDelayMillisif the idle window is shorter than 10s. -
Application-layer proxy idle timeouts — e.g. HAProxy
timeout server/timeout client, pgbouncer, many cloud LBs. These count data inactivity, and TCP keep-alive probes are empty kernel-level segments that the proxy does not see as activity.keepAlivedoes not help here — the proxy will still close the connection on schedule. For these you must either raise the proxy's idle timeout (for HAProxy inmode tcp,timeout tunnelis the knob for long-lived connections), or rely on the pool recovering from the drop.
That recovery is the important part, and is always on regardless of proxy
config: a pool error handler is attached, so a dropped idle connection is
logged and discarded (the pool transparently reconnects on next use) instead of
being re-thrown as an uncaught EventEmitter 'error' that crashes the host
process. The drop itself is harmless; what used to be fatal was the missing
handler.
const db = new ueberdb.Database("postgres", {
host: "127.0.0.1",
user: "ueberdb",
password: "ueberdb",
database: "ueberdb",
// Helps against kernel/NAT/firewall idle expiry. Does NOT defeat an
// application-layer proxy idle timeout (e.g. HAProxy timeout server) —
// raise the proxy timeout for that; the pool reconnects either way.
keepAlive: true,
keepAliveInitialDelayMillis: 5000,
});If you enabled TLS on your Redis database (available since Redis 6.0) you will need to change your connections parameters, here is an example:
const db = new ueberdb.Database("redis", { url: "rediss://localhost" });Do not provide a host value.
If you don't provide a certificate on the client side, you need to add the
environment variable NODE_TLS_REJECT_UNAUTHORIZED = 0 and add the flag
--tls-auth-clients no when launching the redis-server to accept connections.
- Add the database driver to
packages.json, this will happen automatically if you runnpm install %yourdatabase% - Create
databases/DATABASENAME_db.jsand have it export aDatabaseclass that derives fromlib/AbstractDatabase.js. Implement the required functions. - Add a service for the database to the test job in
.github/workflows/npmpublish.yml. - Add an entry to
test/lib/databases.jsfor your database and configure it to work with the service added to the GitHub workflow. - Install and start the database server and configure it to work with the
settings in your
test/lib/databases.jsentry. - Run
npm testto ensure that it works.
- Dropped broken databases: CrateDB, LevelDB, LMDB (probably a breaking change for some people)
- Introduced CI.
- Introduced better testing.
- Fixed broken database clients IE Redis.
- Updated Depdendencies where possible.
- Tidied file structure.
- Improved documentation.
- Sensible name for software makes it clear that it's maintained by The Etherpad Foundation.
- Make db.init await / async
- I suck at hiding Easter eggs..
Dirty_git will commit and push to Git on every set. To use git init or
git clone within your dirty database location and then set your upstream IE
git remote add origin git://whztevz.
The logic behind dirty git is that you can still use dirty, but you can also have offsite backups. It's noisy and spammy, but it can be useful.