Skip to content

Commit fc268ab

Browse files
author
Shlomi Noach
committed
Merge branch 'fix-infinite-cutover-loop' of github.com:github/gh-ost into fix-infinite-cutover-loop
2 parents 7f72872 + 7517238 commit fc268ab

File tree

175 files changed

+26128
-418
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

175 files changed

+26128
-418
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,9 +58,10 @@ More tips:
5858
Also see:
5959

6060
- [requirements and limitations](doc/requirements-and-limitations.md)
61+
- [common questions](doc/questions.md)
6162
- [what if?](doc/what-if.md)
6263
- [the fine print](doc/the-fine-print.md)
63-
- [Questions](https://github.com/github/gh-ost/issues?q=label%3Aquestion)
64+
- [Community questions](https://github.com/github/gh-ost/issues?q=label%3Aquestion)
6465

6566
## What's in a name?
6667

doc/cheatsheet.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,8 +146,12 @@ gh-ost --allow-master-master --assume-master-host=a.specific.master.com
146146

147147
Topologies using _tungsten replicator_ are peculiar in that the participating servers are not actually aware they are replicating. The _tungsten replicator_ looks just like another app issuing queries on those hosts. `gh-ost` is unable to identify that a server participates in a _tungsten_ topology.
148148

149-
If you choose to migrate directly on master (see above), there's nothing special you need to do. If you choose to migrate via replica, then you must supply the identity of the master, and indicate this is a tungsten setup, as follows:
149+
If you choose to migrate directly on master (see above), there's nothing special you need to do.
150+
151+
If you choose to migrate via replica, then you need to make sure Tungsten is configured with log-slave-updates parameter (note this is different from MySQL's own log-slave-updates parameter), otherwise changes will not be in the replica's binlog, causing data to be corrupted after table swap. You must also supply the identity of the master, and indicate this is a tungsten setup, as follows:
150152

151153
```
152154
gh-ost --tungsten --assume-master-host=the.topology.master.com
153155
```
156+
157+
Also note that `--switch-to-rbr` does not work for a Tungsten setup as the replication process is external, so you need to make sure `binlog_format` is set to ROW before Tungsten Replicator connects to the server and starts applying events from the master.

doc/command-line-flags.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,3 +130,7 @@ See `approve-renamed-columns`
130130
### test-on-replica
131131

132132
Issue the migration on a replica; do not modify data on master. Useful for validating, testing and benchmarking. See [testing-on-replica](testing-on-replica.md)
133+
134+
### timestamp-old-table
135+
136+
Makes the _old_ table include a timestamp value. The _old_ table is what the original table is renamed to at the end of a successful migration. For example, if the table is `gh_ost_test`, then the _old_ table would normally be `_gh_ost_test_del`. With `--timestamp-old-table` it would be, for example, `_gh_ost_test_20170221103147_del`.

doc/interactive-commands.md

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,9 @@ Both interfaces may serve at the same time. Both respond to simple text command,
2626
- The `critical-load` format must be: `some_status=<numeric-threshold>[,some_status=<numeric-threshold>...]`'
2727
- For example: `Threads_running=1000,threads_connected=5000`, and you would then write/echo `critical-load=Threads_running=1000,threads_connected=5000` to the socket.
2828
- `nice-ratio=<ratio>`: change _nice_ ratio: 0 for aggressive (not nice, not sleeping), positive integer `n`:
29-
- For any `1ms` spent copying rows, spend `n*1ms` units of time sleeping.
30-
- Examples: assume a single rows chunk copy takes `100ms` to complete.
31-
- `nice-ratio=0.5` will cause `gh-ost` to sleep for `50ms` immediately following.
29+
- For any `1ms` spent copying rows, spend `n*1ms` units of time sleeping.
30+
- Examples: assume a single rows chunk copy takes `100ms` to complete.
31+
- `nice-ratio=0.5` will cause `gh-ost` to sleep for `50ms` immediately following.
3232
- `nice-ratio=1` will cause `gh-ost` to sleep for `100ms`, effectively doubling runtime
3333
- value of `2` will effectively triple the runtime; etc.
3434
- `throttle-query`: change throttle query
@@ -38,6 +38,10 @@ Both interfaces may serve at the same time. Both respond to simple text command,
3838
- `unpostpone`: at a time where `gh-ost` is postponing the [cut-over](cut-over.md) phase, instruct `gh-ost` to stop postponing and proceed immediately to cut-over.
3939
- `panic`: immediately panic and abort operation
4040

41+
### Querying for data
42+
43+
For commands that accept an argumetn as value, pass `?` (question mark) to _get_ current value rather than _set_ a new one.
44+
4145
### Examples
4246

4347
While migration is running:
@@ -63,6 +67,11 @@ $ echo "chunk-size=250" | nc -U /tmp/gh-ost.test.sample_data_0.sock
6367
# Serving on TCP port: 10001
6468
```
6569

70+
```shell
71+
$ echo "chunk-size=?" | nc -U /tmp/gh-ost.test.sample_data_0.sock
72+
250
73+
```
74+
6675
```shell
6776
$ echo throttle | nc -U /tmp/gh-ost.test.sample_data_0.sock
6877

doc/questions.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# How?
2+
3+
### How does the cut-over work? Is it really atomic?
4+
5+
The cut-over phase, where the original table is swapped away, and the _ghost_ table takes its place, is an atomic, blocking, controlled operation.
6+
7+
- Atomic: the tables are swapped together. There is no gap where your table does not exist.
8+
- Blocking: all app queries involving the migrated (original) table are either operate on the original table, or are blocked, or proceed to operate on the _new_ table (formerly the _ghost_ table, now swapped in).
9+
- Controlled: the cut-over times out at pre-defined threshold, and is atomically aborted, then re-attempted. Cut-over only takes place when no lags are present, and otherwise no throttling reason is found. Cut-over step itself gets high priority and is never throttled.
10+
11+
Read more on [cut-over](cut-over.md) and on the [cut-over design Issue](https://github.com/github/gh-ost/issues/82)
12+
13+
14+
# Is it possible to?
15+
16+
### Is it possible to add a UNIQUE KEY?
17+
18+
Adding a `UNIQUE KEY` is possible, in the condition that no violation will occur. That is, you must make sure there aren't any violating rows on your table before, and during the migration.
19+
20+
At this time there is no equivalent to `ALTER IGNORE`, where duplicates are implicitly and silently thrown away. The MySQL `5.7` docs say:
21+
22+
> As of MySQL 5.7.4, the IGNORE clause for ALTER TABLE is removed and its use produces an error.
23+
24+
It is therefore unlikely that `gh-ost` will support this behavior.
25+
26+
# Why

go/base/context.go

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,7 @@ type MigrationContext struct {
121121
OkToDropTable bool
122122
InitiallyDropOldTable bool
123123
InitiallyDropGhostTable bool
124+
TimestampOldTable bool // Should old table name include a timestamp
124125
CutOverType CutOver
125126
ReplicaServerId uint
126127

@@ -135,7 +136,9 @@ type MigrationContext struct {
135136
OriginalBinlogFormat string
136137
OriginalBinlogRowImage string
137138
InspectorConnectionConfig *mysql.ConnectionConfig
139+
InspectorMySQLVersion string
138140
ApplierConnectionConfig *mysql.ConnectionConfig
141+
ApplierMySQLVersion string
139142
StartTime time.Time
140143
RowCopyStartTime time.Time
141144
RowCopyEndTime time.Time
@@ -232,11 +235,12 @@ func (this *MigrationContext) GetGhostTableName() string {
232235

233236
// GetOldTableName generates the name of the "old" table, into which the original table is renamed.
234237
func (this *MigrationContext) GetOldTableName() string {
235-
if this.TestOnReplica {
236-
return fmt.Sprintf("_%s_ght", this.OriginalTableName)
237-
}
238-
if this.MigrateOnReplica {
239-
return fmt.Sprintf("_%s_ghr", this.OriginalTableName)
238+
if this.TimestampOldTable {
239+
t := this.StartTime
240+
timestamp := fmt.Sprintf("%d%02d%02d%02d%02d%02d",
241+
t.Year(), t.Month(), t.Day(),
242+
t.Hour(), t.Minute(), t.Second())
243+
return fmt.Sprintf("_%s_%s_del", this.OriginalTableName, timestamp)
240244
}
241245
return fmt.Sprintf("_%s_del", this.OriginalTableName)
242246
}
@@ -559,7 +563,11 @@ func (this *MigrationContext) GetControlReplicasLagResult() mysql.ReplicationLag
559563
func (this *MigrationContext) SetControlReplicasLagResult(lagResult *mysql.ReplicationLagResult) {
560564
this.throttleMutex.Lock()
561565
defer this.throttleMutex.Unlock()
562-
this.controlReplicasLagResult = *lagResult
566+
if lagResult == nil {
567+
this.controlReplicasLagResult = *mysql.NewNoReplicationLagResult()
568+
} else {
569+
this.controlReplicasLagResult = *lagResult
570+
}
563571
}
564572

565573
func (this *MigrationContext) GetThrottleControlReplicaKeys() *mysql.InstanceKeyMap {

go/binlog/gomysql_reader.go

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ import (
1616
"github.com/outbrain/golib/log"
1717
gomysql "github.com/siddontang/go-mysql/mysql"
1818
"github.com/siddontang/go-mysql/replication"
19+
"golang.org/x/net/context"
1920
)
2021

2122
type GoMySQLReader struct {
@@ -39,7 +40,16 @@ func NewGoMySQLReader(connectionConfig *mysql.ConnectionConfig) (binlogReader *G
3940
}
4041

4142
serverId := uint32(binlogReader.MigrationContext.ReplicaServerId)
42-
binlogReader.binlogSyncer = replication.NewBinlogSyncer(serverId, "mysql")
43+
44+
binlogSyncerConfig := &replication.BinlogSyncerConfig{
45+
ServerID: serverId,
46+
Flavor: "mysql",
47+
Host: connectionConfig.Key.Hostname,
48+
Port: uint16(connectionConfig.Key.Port),
49+
User: connectionConfig.User,
50+
Password: connectionConfig.Password,
51+
}
52+
binlogReader.binlogSyncer = replication.NewBinlogSyncer(binlogSyncerConfig)
4353

4454
return binlogReader, err
4555
}
@@ -49,10 +59,6 @@ func (this *GoMySQLReader) ConnectBinlogStreamer(coordinates mysql.BinlogCoordin
4959
if coordinates.IsEmpty() {
5060
return log.Errorf("Emptry coordinates at ConnectBinlogStreamer()")
5161
}
52-
log.Infof("Registering replica at %+v:%+v", this.connectionConfig.Key.Hostname, uint16(this.connectionConfig.Key.Port))
53-
if err := this.binlogSyncer.RegisterSlave(this.connectionConfig.Key.Hostname, uint16(this.connectionConfig.Key.Port), this.connectionConfig.User, this.connectionConfig.Password); err != nil {
54-
return err
55-
}
5662

5763
this.currentCoordinates = coordinates
5864
log.Infof("Connecting binlog streamer at %+v", this.currentCoordinates)
@@ -126,7 +132,7 @@ func (this *GoMySQLReader) StreamEvents(canStopStreaming func() bool, entriesCha
126132
if canStopStreaming() {
127133
break
128134
}
129-
ev, err := this.binlogStreamer.GetEvent()
135+
ev, err := this.binlogStreamer.GetEvent(context.Background())
130136
if err != nil {
131137
return err
132138
}

go/cmd/gh-ost/main.go

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ func main() {
7777
flag.BoolVar(&migrationContext.OkToDropTable, "ok-to-drop-table", false, "Shall the tool drop the old table at end of operation. DROPping tables can be a long locking operation, which is why I'm not doing it by default. I'm an online tool, yes?")
7878
flag.BoolVar(&migrationContext.InitiallyDropOldTable, "initially-drop-old-table", false, "Drop a possibly existing OLD table (remains from a previous run?) before beginning operation. Default is to panic and abort if such table exists")
7979
flag.BoolVar(&migrationContext.InitiallyDropGhostTable, "initially-drop-ghost-table", false, "Drop a possibly existing Ghost table (remains from a previous run?) before beginning operation. Default is to panic and abort if such table exists")
80+
flag.BoolVar(&migrationContext.TimestampOldTable, "timestamp-old-table", false, "Use a timestamp in old table name. This makes old table names unique and non conflicting cross migrations")
8081
cutOver := flag.String("cut-over", "atomic", "choose cut-over type (default|atomic, two-step)")
8182
flag.BoolVar(&migrationContext.ForceNamedCutOverCommand, "force-named-cut-over", false, "When true, the 'unpostpone|cut-over' interactive command must name the migrated table")
8283

@@ -242,5 +243,5 @@ func main() {
242243
migrator.ExecOnFailureHook()
243244
log.Fatale(err)
244245
}
245-
log.Info("Done")
246+
fmt.Fprintf(os.Stdout, "# Done\n")
246247
}

go/logic/applier.go

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,14 +70,15 @@ func (this *Applier) InitDBConnections() (err error) {
7070
if err := this.readTableColumns(); err != nil {
7171
return err
7272
}
73+
log.Infof("Applier initiated on %+v, version %+v", this.connectionConfig.ImpliedKey, this.migrationContext.ApplierMySQLVersion)
7374
return nil
7475
}
7576

7677
// validateConnection issues a simple can-connect to MySQL
7778
func (this *Applier) validateConnection(db *gosql.DB) error {
78-
query := `select @@global.port`
79+
query := `select @@global.port, @@global.version`
7980
var port int
80-
if err := db.QueryRow(query).Scan(&port); err != nil {
81+
if err := db.QueryRow(query).Scan(&port, &this.migrationContext.ApplierMySQLVersion); err != nil {
8182
return err
8283
}
8384
if port != this.connectionConfig.Key.Port {
@@ -141,6 +142,10 @@ func (this *Applier) ValidateOrDropExistingTables() error {
141142
return err
142143
}
143144
}
145+
if len(this.migrationContext.GetOldTableName()) > mysql.MaxTableNameLength {
146+
log.Fatalf("--timestamp-old-table defined, but resulting table name (%s) is too long (only %d characters allowed)", this.migrationContext.GetOldTableName(), mysql.MaxTableNameLength)
147+
}
148+
144149
if this.tableExists(this.migrationContext.GetOldTableName()) {
145150
return fmt.Errorf("Table %s already exists. Panicking. Use --initially-drop-old-table to force dropping it, though I really prefer that you drop it or rename it away", sql.EscapeName(this.migrationContext.GetOldTableName()))
146151
}

go/logic/inspect.go

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ func (this *Inspector) InitDBConnections() (err error) {
6060
if err := this.applyBinlogFormat(); err != nil {
6161
return err
6262
}
63+
log.Infof("Inspector initiated on %+v, version %+v", this.connectionConfig.ImpliedKey, this.migrationContext.InspectorMySQLVersion)
6364
return nil
6465
}
6566

@@ -168,9 +169,9 @@ func (this *Inspector) inspectOriginalAndGhostTables() (err error) {
168169

169170
// validateConnection issues a simple can-connect to MySQL
170171
func (this *Inspector) validateConnection() error {
171-
query := `select @@global.port`
172+
query := `select @@global.port, @@global.version`
172173
var port int
173-
if err := this.db.QueryRow(query).Scan(&port); err != nil {
174+
if err := this.db.QueryRow(query).Scan(&port, &this.migrationContext.InspectorMySQLVersion); err != nil {
174175
return err
175176
}
176177
if port != this.connectionConfig.Key.Port {
@@ -347,7 +348,7 @@ func (this *Inspector) validateLogSlaveUpdates() error {
347348
}
348349

349350
if this.migrationContext.IsTungsten {
350-
log.Warning("log_slave_updates not found on %s:%d, but --tungsten provided, so I'm proceeding", this.connectionConfig.Key.Hostname, this.connectionConfig.Key.Port)
351+
log.Warningf("log_slave_updates not found on %s:%d, but --tungsten provided, so I'm proceeding", this.connectionConfig.Key.Hostname, this.connectionConfig.Key.Port)
351352
return nil
352353
}
353354

@@ -356,7 +357,7 @@ func (this *Inspector) validateLogSlaveUpdates() error {
356357
}
357358

358359
if this.migrationContext.InspectorIsAlsoApplier() {
359-
log.Warning("log_slave_updates not found on %s:%d, but executing directly on master, so I'm proceeeding", this.connectionConfig.Key.Hostname, this.connectionConfig.Key.Port)
360+
log.Warningf("log_slave_updates not found on %s:%d, but executing directly on master, so I'm proceeeding", this.connectionConfig.Key.Hostname, this.connectionConfig.Key.Port)
360361
return nil
361362
}
362363

0 commit comments

Comments
 (0)