How Unsplittable-Flow-Covering helps Scheduling with Job-Dependent Cost Functions

Generalizing many well-known and natural scheduling problems, scheduling with job-specific cost functions has gained a lot of attention recently. In this setting, each job incurs a cost depending on its completion time, given by a private cost function, and one seeks to schedule the jobs to minimize the total sum of these costs. The framework captures many important scheduling objectives such as weighted flow time or weighted tardiness. Still, the general case as well as the mentioned special cases are far from being very well understood yet, even for only one machine. Aiming for better general understanding of this problem, in this paper we focus on the case of uniform job release dates on one machine for which the state of the art is a 4-approximation algorithm. This is true even for a special case that is equivalent to the covering version of the well-studied and prominent unsplittable flow on a path problem, which is interesting in its own right. For that covering problem, we present a quasi-polynomial time $(1+\epsilon)$-approximation algorithm that yields an $(e+\epsilon)$-approximation for the above scheduling problem. Moreover, for the latter we devise the best possible resource augmentation result regarding speed: a polynomial time algorithm which computes a solution with \emph{optimal }cost at $1+\epsilon$ speedup. Finally, we present an elegant QPTAS for the special case where the cost functions of the jobs fall into at most $\log n$ many classes. This algorithm allows the jobs even to have up to $\log n$ many distinct release dates.


Introduction
In scheduling, a natural way to evaluate the quality of a computed solution is to assign a cost to each job which depends on its completion time. The goal is then to minimize the sum of these costs. The function describing this dependence may be completely different for each job. There are many well-studied and important scheduling objectives which can be cast in this framework. Some of them are already very well understood, for instance weighted sum of completion times j w j C j for which there are polynomial time approximation schemes (PTASs) [1], even for multiple machines and very general machine models. On the other hand, for natural and important objectives such as weighted flow time or weighted tardiness, not even a constant factor polynomial time approximation algorithm is known, even on a single machine. In a recent break-through result, Bansal and Pruhs presented a O(log log P )-approximation algorithm [7,6] for the single machine case where every job has its private cost function. Formally, they study the General Scheduling Problem (GSP) where the input consists of a set of jobs J where each job j ∈ J is specified by a processing time p j , a release date r j , and a non-decreasing cost function f j , and the goal is to compute a preemptive schedule on one machine which minimizes j f j (C j ) where C j denotes the completion time of job j in the computed schedule. Interestingly, even though this problem is very general, subsuming all the objectives listed above, the best known complexity result for it is only strong NP-hardness, so there might even be a polynomial time (1 + ε)-approximation.
Aiming to better understand GSP, in this paper we investigate the special case that all jobs are released at time 0. This case is still strongly NPhard [20] and the currently best know approximation algorithm for it is a (4 + ε)approximation algorithm [18,23] 4 . As observed by Bansal and Verschae [8], this problem is a generalization of the covering-version of the well-studied Unsplittable Flow on a Path problem (UFP) [2,3,5,11,14,17]. The input of this problem consists of a path, each edge e having a demand u e , and a set of tasks T . Each task i is specified by a start vertex s i , an end vertex t i , a size p i , and a cost c i . In the covering version, the goal is to select a subset of the tasks T ′ ⊆ T which covers the demand profile, i.e., i∈T ′ ∩Te p i ≥ u e where T e denotes all tasks in T whose path uses e. The objective is to minimize the total cost i∈T ′ c i . This covering version of UFP has applications to resource allocation settings such as workforce and energy management, making it an interesting problem in its own right. For example, one can think of the tasks as representing time intervals when employees are available, and one aims at providing certain service level that changes over the day. UFP-cover is a generalization of the knapsack cover problem [12] and corresponds to instances of GSP without release dates where the cost function of each job attains only the values 0, some job-dependent value c i , and ∞. The best known approximation algorithm for UFP-cover is a 4-approximation [9,13], which essentially matches the best known result for GSP without release dates.
Our Contribution. In this paper we present several new approximation results for GSP without release dates and some of its special cases. First, we give a (1 + ε)-approximation algorithm for the covering version of UFP with quasipolynomial running time. Our algorithm follows the high-level idea of the known QPTAS for the packing version [3]. Its key concept is to start with an edge in the middle and to consider the tasks using it. One divides these tasks into groups, all tasks in a group having roughly the same size and cost, and guesses for each group an approximation of the capacity profile used by the tasks from that group. In the packing version, one can show that by slightly underestimating the true profile one still obtains almost the same profit as the optimum. For the covering version, a natural adjustment would be to use an approximate profile which over estimates the true profile. However, when using only a polynomial number of approximate profiles, it can happen that in the instance there are simply not enough tasks from a group available so that one can cover the overestimated profile which approximates the actual profile in the best possible way.
We remedy this problem in a maybe counterintuitive fashion. Instead of guessing an approximate upper bound of the true profile, we first guess a lower bound of it. Then we select tasks that cover this lower bound, and finally add a small number of "maximally long" additional tasks. Using this procedure, we cannot guarantee (instance-independently) how much our selected tasks exceed the guessed profile on each edge. However, we can guarantee that for the correctly guessed profile, we cover at least as much as the optimum and pay only slightly more. Together with the recursive framework from [3], we obtain a QPTAS. As an application, we use this algorithm to get a quasi-polynomial time (e + ε)approximation algorithm for GSP with uniform release dates, improving the approximation ratio of the best known polynomial time 4-approximation algorithm [18,23].
Moreover, we consider a different way to relax the problem. Rather than sacrificing a 1 + ε factor in the objective value, we present a polynomial time algorithm that computes a solution with optimal cost but requiring a speedup of 1 + ε. Such a result can be easily obtained for job-independent, scalable cost functions using the PTAS in [22] (a cost function f is scalable if f (c t) = φ(c) f (t) for some suitable function φ and all all c, t ≥ 0). In our case, however, the cost functions of the jobs can be much more complicated and, even worse, they can be different for each job. Our algorithm first imposes some simplification on the solutions under consideration, at the cost of a (1 + ε)-speedup. Then, we use a recently introduced technique to first guess a set of discrete intervals representing slots for large jobs and then use a linear program to simultaneously assign large jobs into these slots and small jobs into the remaining idle times [25].
An interesting open question is to design a (Q)PTAS for GSP without release dates. As a first step towards this goal, recently Megow and Verschae [22] presented a PTAS for minimizing the objective function j w j g(C j ) where each job j has a private weight w j but the function g is identical for all jobs. In Section 4 we present a QPTAS for a generalization of this setting. Instead of only one function g for all jobs, we allow up to (log n) O(1) such functions, each job using one of them, and we even allow the jobs to have up to (log n) O(1) distinct release dates. Despite the fact that this setting is much more general, our algorithm is very clean and easy to analyze. this is now the best known polynomial time approximation result. For instance, for the important weighted flow time objective, previously the best known approximation factors were O(log 2 P ), O(log W ) and O(log nP ) [4,16], where P and W denote the ranges of the job processing times and weights, respectively. A QPTAS with running time n Oε(log P log W ) is also known [15]. For the objective of minimizing the weighted sum of completion times, PTASs are known, even for an arbitrary number of identical and a constant number of unrelated machines [1].
For the case of GSP with identical release dates, Bansal and Pruhs [7] give a 16-approximation algorithm. Later, Shmoys and Cheung claimed a primal-dual (2+ε)-approximation algorithm [18]. However, an instance was later found where the algorithm constructs a dual solution which differs from the best integral solution by a factor 4 [23], suggesting that the primal-dual analysis can show only an approximation ratio of 4. On the other hand, Mestre and Verschae [23] showed that the local-ratio interpretation of that algorithm (recall the close relation between the primal-dual schema and the local-ratio technique [10]) is in fact a pseudopolynomial time 4-approximation, yielding a (4 + ε)-approximation in polynomial time.
As mentioned above, a special case of GSP with uniform release dates is a generalization for the covering version of Unsplittable Flow on a Path. For this special case, a 4-approximation algorithm is known [9,13]. The packing version is very well studied. After a series of papers on the problem and its special cases [5,11,14,17], the currently best known approximation results are a QPTAS [3] and a (2 + ε)-approximation in polynomial time [2].

Quasi-PTAS for UFP-Cover
In this section, we present a quasi-polynomial time (1 + ε)-approximation algorithm for the UFP-cover problem. Subsequently, we show how it can be used to obtain an approximation algorithm with approximation ratio e + ε ≈ 2.718 + ε and quasi-polynomial running time for GSP without release dates. Throughout this section, we assume that the sizes of the tasks are quasi-polynomially bounded. Our algorithm follows the structure from the QPTAS for the packing version of Unsplittable Flow on a Path due to Bansal et al. [3]. First, we describe a recursive exact algorithm with exponential running time. Subsequently, we describe how to turn this routine into an algorithm with only quasi-polynomial running time and an approximation ratio of 1 + ε.
For computing the exact solution (in exponential time) one can use the following recursive algorithm: Given the path G = (V, E), denote by e M the edge in the middle of G and let T M denote the tasks that use e M . Our strategy is to "guess" which tasks in T M are contained in OPT, the (unknown) optimal solution. Note that once these tasks are chosen, the remaining problem splits into the two independent subproblems given by the edges on the left and on the right of e M , respectively, and the tasks whose paths are fully contained in them. Therefore, we enumerate all subsets of T ′ M ⊆ T M , denote by T M the resulting set of sets. For each set T ′ M ∈ T M we recursively compute the optimal solution for the subpaths {e 1 , ..., e M−1 } and {e M+1 , ..., e |E| }, subject to the tasks in T ′ M being already chosen and that no more tasks from T M are allowed to be chosen. The leaf subproblems are given when the path in the recursive call has only one edge. Since |E| = O(n) this procedure has a recursion depth of O(log n) which is helpful when aiming at quasi-polynomial running time. However, since in each recursive step we try each set T ′ M ∈ T M , the running time is exponential (even in one single step of the recursion). To remedy this issue, we will show that for any set T M appearing in the recursive procedure there is a setT M which is of small size and which approximates T M well. More precisely, we can computeT M in quasi-polynomial time (and it thus has only quasi-polynomial size) and there is a set For any set of tasks T ′ we write c(T ′ ) := i∈T ′ c i , and for two sets of tasks T 1 , T 2 , we say that T 1 dominates T 2 if i∈T1∩Te d i ≥ i∈T2∩Te d i for each edge e. We modify the above procedure such that we do recurse on sets inT M instead of T M . SinceT M has quasi-polynomial size,T M contains the mentioned set T * M , and the recursion depth is O(log n), the resulting algorithm is a QPTAS. In the sequel, we describe the above algorithm in detail and show in particular how to obtain the setT M .

Formal Description of the Algorithm
We use a binary search procedure to guess the optimal objective value B. First, we reject all tasks i whose cost is larger than B and select all tasks i whose cost is at most εB/n. The latter cost at most n · εB/n ≤ εB and thus only a factor 1 + ε in the approximation ratio. We update the demand profile accordingly.
We define a recursive procedure UFPcover(E ′ , T ′ ) which gets as input a subpath E ′ ⊆ E of G and a set of already chosen tasks T ′ . Denote byT the set of all tasks i ∈ T \ T ′ such that the path of i uses only edges in E ′ . The output of UFPcover(E ′ , T ′ ) is a (1 + ε)-approximation to the minimum cost solution for the subproblem of selecting a set of tasks T ′′ ⊆T such that T ′ ∪ T ′′ satisfy all demands of the edges in E ′ , i.e., i∈(T ′ ∪T ′′ )∩Te p i ≥ d e for each edge e ∈ E ′ . Note that there might be no feasible solution for this subproblem in which case we output ∞. Let e M be the edge in the middle of E ′ , i.e., at most |E ′ |/2 edges are on the left and on the right of e M , respectively. Denote by T M ⊆T all tasks inT whose path uses e M . As described above, the key is now to construct the setT M with the above properties. Given this set, we compute L and E ′ R denote the subpaths of E ′ on the left and on the right of e M , respectivley. We output for each pair (k, ℓ), denoting (approximately) cost (1 + ε) k and size (1 + ε) ℓ , we define Since the sizes of the tasks are quasi-polynomially bounded and we preprocessed the weights of the tasks, we have (log n) O(1) non-empty groups.
For each group T (k,ℓ) , we compute a setT (k,ℓ) containing at least one set which is not much more expensive than OPT (k,ℓ) := OPT ∩T (k,ℓ) and which dominates OPT (k,ℓ) . To this end, observe that the sizes of the tasks in OPT (k,ℓ) cover a certain profile (see Figure 1). Initially, we guess the number of tasks in OPT (k,ℓ) , and if | OPT (k,ℓ) | ≤ 1 ε 2 then we simply enumerate all subsets of T (k,ℓ) with at most 1 ε 2 tasks. Otherwise, we consider a polynomial number of profiles that are potential approximations of the true profile covered by OPT (k,ℓ) . To this end, we subdivide the (implicitly) guessed height of the true profile evenly into 1 ε steps of uniform height, and we allow the approximate profiles to use only those heights while being monotonously increasing and decreasing before and after e M , respectively (observe that also OPT (k,ℓ) has this property since all its tasks use e M ). This leads to at most n O(1/ε) different approximate profiles in total.
For each approximate profile we compute a set of tasks covering it using LProunding. The path of any task in T (k,ℓ) contains the edge e M , and hence, a task covering an edge e always covers all edges inbetween e and e M as well. Thus, when formulating the problem as an LP, it suffices to introduce one constraint for the leftmost and one constraint for the rightmost edge of each height in the approximated profile. We compute an extreme point solution of the LP and round up each of the at most 2 ε fractional variables. Since | OPT (k,ℓ) | ≥ 1 ε 2 this increases the cost at most a factor 1 + O(ε) compared to the cost of the LP.
It is clear that the LP has a solution if the approximate profile is dominated by the true profile. Among such approximate profiles, consider the one that is closest to the latter. On each edge it would be sufficient to add O(ε · OPT (k,ℓ) ) tasks from T (k,ℓ) in order to close the remaining gap. This is due to our choice of the step size of the approximate profile and the fact that all tasks in T (k,ℓ) have roughly the same size. To this end, from the not yet selected tasks in T (k,ℓ) we add the O(ε · | OPT (k,ℓ) ) tasks with the leftmost start vertex and the O(ε · | OPT (k,ℓ) ) tasks with the rightmost end vertex (see Figure 1). This costs again at most an O(ε)-fraction of the cost so far. As a result, on each edge e we have either selected O(ε · OPT (k,ℓ) ) additional tasks using it, thus closing the remaining gap, or we have selected all tasks from T (k,ℓ) using e. In either case, the selected tasks dominate the tasks in OPT (k,ℓ) , i.e., the true profile. The above procedure is described in detail in Appendix A.
Lemma 1. Given a group T (k,ℓ) . There is a polynomial time algorithm which computes a set of task setsT (k,ℓ) which contains a set T * We define the setT M by taking all combinations of selecting exactly one set from the setT (k,ℓ) of each group T (k,ℓ) . Since there are (log n) O(1) groups, by Lemma 1 the setT M has only quasi-polynomial size and it contains one set T * M which is a a good approximation to T M ∩ OPT, i.e., the set T * M dominates T M ∩ OPT and it is at most by a factor 1 + O(ε) more expensive. Now each node in the recursion tree has at most n (log n) O(1) children and, as argued above, the recursion depth is O(log n). Thus, a call to UFPcover(E, ∅) has quasi-polynomial running time and yields a (1 + O(ε))-approximation for the overall problem. Theorem 1. For any ε > 0 there is a quasi-polynomial (1 + ε)-approximation algorithm for UFP-cover if the sizes of the tasks are in a quasi-polynomial range.
Bansal and Pruhs [7] give a 4-approximation-preserving reduction from GSP with uniform release dates to UFP-cover using geometric rounding. Here we observe that if instead we use randomized geometric rounding [19], then one can obtain an e-approximation-preserving reduction. Together with our QPTAS for UFP-cover, we get the following result, whose proof we defer to Appendix A.
Theorem 2. For any ε > 0 there is a quasi-polynomial time (e+ε)-approximation algorithm for GSP with uniform release dates.

General Cost Functions under Speedup
We present a polynomial time algorithm which computes a solution for an instance of GSP with uniform release dates whose cost is optimal and which is feasible if the machine runs with speed 1 + ε (rather than unit speed).
Let 1 > ε > 0 be a constant and assume for simplicity that 1 ε ∈ N. For our algorithm, we first prove some properties that we can assume "at 1 + ε speedup"; by this, we mean that there is a schedule whose cost is at most the optimal cost (without enforcing these restricting properties) and which is feasible if we increase the speed of the machine by a factor 1 + ε. Many statements are similar to properties that are used in [1] for constructing PTASs for the problem of minimizing the weighted sum of completion times.
For a given schedule denote by S j and C j the start and end times of job j in a given schedule (recall that we consider only non-preemptive schedules). We define C (1+ε) j to be the smallest power of 1 + ε which is not smaller than C j , i.e., C (1+ε) j := (1 + ε) ⌈log 1+ε Cj⌉ , and adjust the objective function as given in the next lemma. Also, we impose that jobs that are relatively large are not processed too early; formally, they do not run before (1 + ε) ⌊log 1+ε ε·pj /(1+ε)⌋ which is the largest power of 1 + ε which is at most ε/(1 + ε)·p j (the speedup will compensate for the delay of the start time).
Next, we discretize the time axis into intervals of the form Following Lemma 2, to simplify the problem we want to assign an artificial release date to each job j. For each job j, we define r(j) := (1 + ε) ⌊log 1+ε ε·pj /(1+ε)⌋ . Lemma 2 implies then that we can assume S j ≥ r(j) for each job j. Therefore, we interpret the value r(j) as the release date of job j and from now on disallow to start job j before time r(j).
In a given schedule, we call a job j large if S j ≤ 1 ε 3 · p j and small otherwise. For the large jobs, we do not allow arbitrary starting times but we discretize the time axis such that each interval contains only a constant number of starting times for large jobs (for constant ε). For the small jobs, we do not want them to overlap over interval boundaries and we want that all small jobs scheduled in an interval I t are scheduled during one (connected) subinterval I s t ⊆ I t . Lemma 3. At 1 + O(ε) speedup we can assume that each small job starting during an interval I t finishes during I t , each interval I t contains only O( 1 ε 3 ) potential start points for large jobs, and for each interval I t there is a time interval I s t ⊆ I t , ranging from one potential start point for large jobs to another, which contains all small jobs scheduled in I t and no large jobs.
For the moment, let us assume that the processing times of the instance are polynomially bounded. We will give a generalization to arbitrary instances later.
Our strategy is the following: Since the processing times are bounded, the whole schedule finishes within log 1+ε ( j p j ) ≤ O ε (log n) intervals. Ideally, we would like to guess the placement of all large jobs in the schedule and then use a linear program to fill in the remaining small jobs. However, this would result in n Oε(log n) possibilities for the large jobs, which is quasi-polynomial but not polynomial. Instead, we only guess the pattern of large-job usage for each interval. A pattern P for an interval is a set of O( 1 ε 3 ) integers which defines the start and end times of the large jobs which are executed during I t . Note that such a job might start before I t and/or end after I t .
Proposition 1. For each interval I t there are only N ∈ O ε (1) many possible patterns. The value N is independent of t.
We first guess all patterns for all intervals in parallel. Since there are only O ε (log n) intervals, this yields only N Oε(log n) ∈ n Oε(1) possible combinations for all patterns for all intervals. Suppose now that we guessed the pattern corresponding to the optimal solution correctly. Next, we solve a linear program that in parallel assigns large jobs to the slots specified by the pattern, and also, it assigns small jobs into the remaining idle times on the intervals. Formally, we solve the following LP. We denote by Q the set of all slots for large jobs, size(s) denotes the length of a slot s, begin(s) its start time, and t(s) denotes the index of the interval I t that contains s. For each interval I t denote by rem(t) the remaining idle time for small jobs, and consider these idle times as slots for small jobs, which we refer to by their interval indices I := {1, . . . , log 1+ε ( j p j )}. For each pair of slot s ∈ Q and job j ∈ J, we introduce a variable x s,j corresponding to assigning j to s. Analogously, we use variables y t,j for the slots in I.
xs,j, yt,j ≥ 0 ∀ s ∈ Q, ∀ t ∈ I, ∀j ∈ J (7) Denote the above LP by sLP. It has polynomial size and thus we can solve it efficiently. Borrowing ideas from [24] we round it to a solution that is not more costly and which can be made feasible using additional speedup of 1 + ε.
In particular, the cost of the computed solution is no more than the cost of the integral optimum and it is feasible under 1 + O(ε) speedup (accumulating all the speedups from the previous lemmas). We remark that the technique of guessing patterns and filling them in by a linear program was first used in [25]. For the general case, i.e., for arbitrary processing times, we first show that at 1 + ε speedup, we can assume that for each job j there are only O(log n) intervals between r(j) (the artificial release date of j) and C j . Then we devise a dynamic program which moves from left to right on the time axis and considers sets of O(log n) intervals at a time, using the above technique. See Appendix C for details. Theorem 3. Let ε > 0. There is a polynomial time algorithm for GSP with uniform release dates which computes a solution with optimal cost and which is feasible if the machine runs with speed 1 + ε.

Few Classes of Cost Functions
In this section, we study the following special case of GSP with release dates. We assume that each cost function f j can be expressed as f j = w j · g u(j) for a jobdependent weight w j , k global functions g 1 , ..., g k , and an assignment u : J → [k] of cost functions to jobs. We present a QPTAS for this problem, assuming that k = (log n) O (1) and that the jobs have at most (log n) O(1) distinct release dates. We assume that the job weights are in a quasi-polynomial range, i.e., we assume that there is an upper bound W = 2 (log n) O(1) for the (integral) job weights.
In our algorithm, we first round the values of the functions g i so that they attain only few values, (log n) O(1) many. Then we guess the (log n) O(1) /ε most expensive jobs and their costs. For the remaining problem, we use a linear program. Since we rounded the functions g i , our LP is sparse, and by rounding an extreme point solution we increase the cost by at most an ε-fraction of the cost of the previously guessed jobs, which yields an (1 + ε)-approximation overall.
Formally, we use a binary search framework to estimate the optimal value B. Having this estimate, we adjust the functions g i such that each of them is a step function with at most (log n) O(1) steps, all being powers of 1 + ε or 0.
Lemma 5. At 1 + ε loss we can assume that for each i ∈ [k] and each t it holds that g i (t) is either 0 or a power of 1 + ε in ε n · B W , B . Our problem is in fact equivalent to assigning a due date d j to each job (cf. [7]) such that the due dates are feasible, meaning that there is a preemptive schedule where every job finishes no later than its due date, and the objective being j f j (d j ). The following lemma characterizes when a set of due dates is feasible. Denote by D all points in time where at least one cost function g i increases. It suffices to consider only those values as possible due dates.

Proposition 2.
There is an optimal due date assignment such that d j ∈ D for each job j.
Denote by R the set of all release dates of the jobs. Recall that |R| ≤ (log n) O(1) . We guess now the |D| · |R|/ε most expensive jobs of the optimal solution and their respective costs. Due to the rounding in Lemma 5 we have that |D| ≤ k · log 1+ε (W ·n/ε) = (log n) O(1) and thus there are only O(n |D|·|R|/ε ) = n (log n) O(1) /ε many guesses.
Suppose we guess this information correctly. Let J E denote the guessed jobs and for each job j ∈ J E denote by d j the latest time where it attains the guessed cost, i.e., its due date. Denote by c thres the minimum cost of a job in J E , according to the guessed costs. The remaining problem consists in assigning a due date d j ∈ D to each job J \ J E such that none of these jobs costs more than c thres , all due dates together are feasible, and the overall cost is minimized. We express this as a linear program. In that LP, we have a variable x j,t for each pair of a job j ∈ J \ J E and a due date t ∈ D such that j does not cost more than c thres when finishing at time t. We add the constraint t∈D x j,t = 1 for each job j, modeling that the job has a due date, and one constraint for each interval [r, t] with r ∈ R and t ∈ D to model the condition given by Lemma 6. See Appendix D for the full LP.
In polynomial time, we compute an extreme point solution x * for the LP. It has at most |D| · |R| + |J \ J E | many non-zeros. Each job j needs at least one non-zero variable x * j,t , due to the constraint t∈D x j,t = 1. Thus, there are at most |D| · |R| fractionally assigned jobs, i.e., jobs j having a variable x * j,t with 0 < x * j,t < 1. We define an integral solution by rounding x * as follows: For each job j we set d j to be the maximum value t such that x * j,t > 0. We round up at most |D| · |R| jobs and after the rounding, each of them costs at most c thres . Hence, those jobs cost at most an ε-fraction of the cost of guessed jobs (J E ).

Lemma 7.
Denote by c(x * ) the cost of the solution x * . We have that is a lower bound on the optimum, we obtain a (1 + ε)approximation. As there are quasi-polynomially many guesses for the expensive jobs and the remainder can be done in polynomial time, we obtain a QPTAS.

Theorem 4.
There is a QPTAS for GSP, assuming that each cost function f j can be expressed as f j = w j · g u(j) for some job-dependent weight w j and at most k = (log n) O(1) global functions g 1 , ..., g k , and that the jobs have at most (log n) O(1) distinct release dates.

A Omitted proofs from Section 2
In order to prove Lemma 1, we formally introduce the notion of a profile. A profile Q : E ′ → R ≥0 assigns a height Q(e) to each edge e ∈ E ′ , and a profile Q dominates a profile Q ′ if Q(e) ≥ Q ′ (e) holds for all e ∈ E ′ . The profile Q T induced by the tasks T is defined by the heights Q T (e) := i∈Te p i , where T e denotes all tasks in T whose path contains the edge e. Finally, a set of tasks T dominates a set of tasks T ′ if Q T dominates Q T ′ . Lemma 1. Given a group T (k,ℓ) . There is a polynomial time algorithm which computes a set of task setsT (k,ℓ) which contains a set T * (k,ℓ) ∈T (k,ℓ) such that c(T * (k,ℓ) ) ≤ (1 + ε) · c(OPT (k,ℓ) ) and T * (k,ℓ) dominates OPT (k,ℓ) .
Proof. In the first step, we guess the number of tasks in OPT (k,ℓ) := T (k,ℓ) ∩ OPT. Abusing notation, we write OPT (k,ℓ) also for the total cost of the tasks in OPT (k,ℓ) . If | OPT (k,ℓ) | is smaller than 1 ε 2 then we can guess an optimal set OPT (k,ℓ) . Otherwise, we will consider a polynomial number of certain approximate profiles one of which underestimates the unknown true profile induced by OPT (k,ℓ) by at most O(ε) · OPT (k,ℓ) . For each approximate profile we will compute a cover of cost no more than 1 + O(ε) the optimum, and in case of the profile being close to the true profile, we can extend this solution to a cover of the true profile by adding only O(ε) · OPT (k,ℓ) more tasks.
Several arguments in the remaining proof are based on the structure of T (k,ℓ) and the resulting structure of the true profile Q OPT (k,ℓ) . Since all tasks in T (k,ℓ) containing the edge e M and spanning a subpath of E ′ , the height of the profile Q OPT (k,ℓ) is unimodular: It is non-decreasing until e M and non-increasing after that; see Figure 1. In particular, a task that covers a certain edge e covers all edges in between e and e M as well.
Moreover, aiming to approximate the true profile, we only take into account profiles in which the edges have non-decreasing and non-increasing height before and after e M on the path, respectively. Utilizing the natural ordering of the edges on the path, we formally define the set Q of approximate profiles as follows Since | OPT (k,ℓ) | · (1 + ε) ℓ+1 is an upper bound on the maximum height of Q OPT (k,ℓ) , there is a profile Q * ∈ Q which is dominated by Q OPT (k,ℓ) and for which the gap Q OPT (k,ℓ) (e) − Q(e) does not exceed ε · | OPT (k,ℓ) | · (1 + ε) ℓ+1 for all e ∈ E ′ . Observe that by construction, an approximate profile can have at most |H| edges at which it jumps from one height to a larger one, and analogously, it can have at most |H| edges where it can jump down to some smaller height. Hence, Q contains at most n 2 |H| = n 2/ε profiles. For each approximate profile Q ∈ Q, we compute a cover based on LP rounding. To this end, we denote by e L (h) and e R (h) the first and last edge e ∈ E ′ for which Q(e) ≥ h, respectively. Note that by the structure of the paths of tasks in T (k,ℓ) , in fact every set of tasks covering e L (h) also covers all edges between e M and e L (h) by at least the same amount, and analogously for e R (h). Regarding the LP-formulation, this allows us to only require a sufficient covering of the edges e L (h) and e R (h) rather than of all edges. Denoting by P i the path of a task i, and by x i the decision variable representing its selection for the cover, we formulate the LP as follows If there exists a feasible solution to the LP, we round up all fractional values x * i (i.e., values x * i ∈ (0, 1)) of some optimal extreme point solution x * , and we choose the corresponding tasks as a cover for Q and denote them by T * . Since the LP has only 2|H| = 2 ε more constraints than variables, its optimal extreme point solutions contain at most 2 ε fractional variables. Hence, the additional cost incurred by the rounding does not exceed 2 ε (1 + ε) k+1 , where the latter term is the maximum task cost in T (k,ℓ) . Let us assume for calculating the cost of the computed solution that Q = Q * . Then, the cost of the selected tasks is at most where the first and second inequality follows from OPT (k,ℓ) ≥ 1 ε 2 and from the minimum task weight in T (k,ℓ) , respectively, and moreover, the first inequality uses that Q = Q * is dominated by Q OPT (k,ℓ) .
After covering Q in the first step with T * , in the second step, we extend this cover by additional edges A * ⊆ T (k,ℓ) \ T * . We define the set A * to be the ε (1 + ε) · OPT (k,ℓ) tasks in T (k,ℓ) \ T * with the leftmost start vertices and the ε (1 + ε) · OPT (k,ℓ) tasks in T (k,ℓ) \ T * with the rightmost end vertices. We add T * ∪ A * to the setT (k,ℓ) .
Assume that Q = Q * . Then the above LP has a feasible solution and in particular the We claim that the computed tasks T * ∪ A * dominate OPT (k,ℓ) . Firstly, observe that any set of ε (1 + ε) · OPT (k,ℓ) tasks from T (k,ℓ) has a total size of at least the gap between two height steps from H. Hence, if an edge e is covered by that many edges from A * and Q = Q * then we know that On the other hand, if an edge e is covered by less than ε (1 + ε) · OPT (k,ℓ) tasks from A * , we know that there exists no further task in T (k,ℓ) \ (T * ∪ A * ) whose path contains e. Otherwise, this would be a contradiction to the choice of the tasks A * being the ε (1 + ε) · OPT (k,ℓ) ones with the leftmost start and rightmost end vertices, respectively. Thus, since in this second case T * ∪ A * contains all tasks that cover e, we have that Q T * ∪A * (e) ≥ Q OPT (k,ℓ) (e).
Finally, the total cost of A * does not exceed and thus the total cost of T * ∪ A * is upper-bounded by We complete the proof by redefining ε appropriately. ⊓ ⊔ Theorem 2. For any ε > 0 there is a quasi-polynomial time (e+ε)-approximation algorithm for GSP with uniform release dates.
Proof. The heart of the proof is an e-approximation-preserving reduction from GSP with uniform release dates to UFP-cover. Although here we develop a randomized algorithm, we note that the reduction can be de-randomized using standard techniques. Given an instance of the scheduling problem we construct an instance of UFP-cover as follows. For ease of presentation, we take our path G = (V, E) to have vertices 0, 1, . . . , P ; towards the end, we explain how to obtain an equivalent and more succinct instance. For each i = 1, . . . , P , edge e = (i − 1, i) has demand u e = P − i.
The reduction has two parameters, γ > 1 and α ∈ [0, 1], which will be chosen later to minimize the approximation guarantee. For each job j, we define a sequence of times t j 0 , t j 1 , t j 2 , . . . , t k j starting from 0 and ending with P + 1 such that the cost of finishing a job in between two consecutive times differs by at most a factor of γ. Formally, t j 0 = 0, t j k = P + 1 and t j i is the first time step such that f (t j i ) > γ i−1+α . For each i > 0 such that t j i−1 < t j i , we create a task covering the interval [t j i−1 , t j i − 1] having demand p j and costing f j (t j i − 1). Given a feasible solution of the UFP-cover instance, we claim that we can construct a feasible schedule of no greater cost. For each job j, we consider the right-most task chosen (we need to pick at least one task from each job to be feasible) in the UFP-cover solution and assign to j a due date equal to the right endpoint of the task. Notice that the cost of finishing the jobs by their due date equals the total cost of these right-most tasks. By the feasibility of the UFP-cover solution, it must be the case that for each time t, the total processing volume of jobs with a due date of t or great is at least T − t + 1. Therefore, scheduling the jobs according to earliest due date first, yields a schedule that meets all the due date. Therefore, the cost of the schedule is at most the cost of the UFP-cover instance.
Conversely, given a feasible schedule, we claim that, if α is chosen uniformly at random and set γ = e, then there is a solution of the UFP-cover instance whose expected cost is at most e times more expensive that the cost of the schedule. For each job j, we pick all the tasks whose left endpoint is less than or equal to the completion time of j. It follows that the UFP-cover solution is feasible. Let f j (C j ) be the cost incurred by j. For a fixed α, let the most expensive task induced by j cost f j (C j )γ β . Notice that β is also uniformly distributed in [0, 1]. The combined expected cost of all the tasks induced by j is therefore which is minimum at γ = e. By linearity of expectation, we get that the total cost of the UFP-cover solution is at most an e factor larger than the cost of the schedule.
To de-randomize the reduction, and at the expense of adding another ε ′ to the approximation factor, one can discretize the random variable α, solve several instances, and return the one producing the best solution. Finally, we mention that it is not necessary to construct the full path from 0 to P . It is enough to keep the vertices where tasks start or end. Stretches where no task begins or end can be summarized by an edge having demand equal to the largest demand in that stretch.
Applying the e-approximation-preserving reduction and then running the (1 + ε)-approximation of Theorem 2 finishes the proof.

B Omitted proofs from Section 3
In the following lemmas, we show different properties that we can assume at a speedup of 1 + ε. In fact, each property requires to increase the speed by another factor of 1 + ε. Compared to the initial unit speed, the final speed will be some power of 1 + ε. Technically, we consolidate the resulting polynomial in ε to some ε ′ = O(ε), achieving all properties of the lemmas at speed 1 + ε ′ .
Proof. Consider some job j with completion time C j in an arbitrary schedule at unit speed. At speed 1 + ε, time C (1+ε) j corresponds to and hence, the ensued cost never exceeds the original cost.
Regarding the second point of the lemma, we observe that running a job j of processing time p j at speed 1 + ε allows for an additional idle time of length ε/(1 + ε) · p j compared to running it at unit speed. Hence, in case that S j < (1 + ε) ⌊log 1+ε ε·pj /(1+ε)⌋ we can set its start time to (1 + ε) ⌊log 1+ε ε·pj /(1+ε)⌋ without exceeding its unit speed completion time.
Hence, we can make the assumptions of the lemma at a total speedup of (1 + ε) 2 , which is 1 + O(ε) under our assumption that ε < 1, so the lemma follows.
⊓ ⊔ Lemma 3 is restated in a slightly stronger way, the statement given here immediately implies the version in the main part of the paper.
-At 1 + ε speedup we can assume that each large job starts at some point in time R t,k and every interval I t,k is used by either only small jobs or by one large job or it is empty.
during which no large jobs are scheduled, and no small jobs are scheduled during I t \ I t,k,ℓ .
Proof. Consider a small job that is started in I t and that is completed in some later interval. By definition, its length is at most ε 3 · R t+1 . At speed 1 + ε, the interval I t provides an additional idle time of length and the length of the small job reduces to at most ε 3 · R t . Since for sufficiently small ε it holds that ε 2 1+ε ≥ ε 3 , the small job can be scheduled during the idle time, and hence, it finishes in I t .
Regarding the second point of the lemma, we observe that the length of a large job starting during I t is at least ε 3 · R t by definition. When running a large job at speed 1 + ε, its processing time reduces by at least ε 4 /(1 + ε) · R t which equals four times the gap between two values R t,k . If I t,k and I t,ℓ are the first and last interval in I t used by some large job j in a unit speed schedule then, at speed 1 + ε, we can start j at time R t,k+2 , and it will finish no later than R t,ℓ−1 or it will finish in some later interval I s , s > t. In case of job j finishing in I t , the speedup allows us to assume j to block the interval [R t,k+2 , R t,ℓ−1 ), and we know that no other job is scheduled in this interval.
Otherwise, if j finishes in some later I s , let I s,m be the subinterval of its completion. Since j is not necessarily large in I s , its reduce in runtime due a speedup may only be marginal with respect to I s . In I s,m , the job j may be followed by a set of s-small jobs and a s-large job (both possibly not existing).
Analogously to the above argumentation, at a speedup of 1 + ε, we can start the s-large job at time R s,m+2 , and the interval I s,m+1 becomes empty. We use this interval to schedule the small jobs from I s,m . This delays their start, however, they still finish in I s which is sufficient: By the first part of Lemma 2, we can calculate the objective function as if every job finished at the next larger value R r after its actual completion time, i.e., at the end of the interval I r during which it finishes. Hence, within an interval I r we can rearrange the intervals I r,k without changing the cost. This completes the proof of the second part of the lemma.
The proof of the third part is a straight-forward implication of its second part. By this we can assume that all small jobs are contained in intervals I t,k that contain no large jobs. Applying again the first part of Lemma 2, we can rearrange those intervals in such a way that they appear consecutively.
⊓ ⊔ Lemma 4. Given a fractional solution (x, y) to sLP. In polynomial time, we can compute a non-negative integral solution (x ′ , y ′ ) whose cost is not larger than the cost of (x, y) and which fulfills the constraints (2), (3), (5), (6), (7) and Proof. The proof follows the general idea of [24]. Given some fractional solution (x, y) to the sLP (2) -(7), we construct a fractional matching M in a bipartite graph G = (V ∪ W, E). For each job j ∈ J and for each large slot s ∈ Q, we introduce vertices v j ∈ V and w s ∈ W , respectively. Moreover, for each slot of small jobs t ∈ I, we add k t := j∈J y t,j vertices w t,1 , . . . , w t,kt ∈ W . We introduce an edge (v j , w s ) ∈ E with cost f j (R t(s)+1 ) for all job-slot pairs for which x s,j > 0, and we choose it to an extent of x s,j for M . Regarding the vertices w t,1 , . . . , w t,kt , we add edges in the following way. We first sort all jobs j with y t,j > 0 in non-increasing order of their length p j , and we assign them greedily to w t,1 , . . . , w t,kt ; that is, we choose the first vertex w t,ℓ which has not yet been assigned one unit of fractional jobs, we assign as much as possible of y t,j to it, and if necessary, we assign the remaining part to the next vertex w t,ℓ+1 . Analogously to the above edges, we define the cost of an edge (v j , w t,ℓ ) to be f j (R t+1 ), and we add it fractionally to M according to the fraction y t,ℓ,j of y t,j the job was assigned to w t,ℓ by the greedy assignment. Note that p min t,ℓ ≥ p max t,ℓ+1 for ℓ = 1, . . . , k t − 1 where p min t,ℓ and p max t,ℓ are the minimum and maximum length of all jobs (fractionally) assigned to w t,ℓ , respectively.
By construction, M is in fact a fractional matching, i.e., for every vertex v j ∈ V the set M contains edges whose chosen fractions add up to exactly 1. Moreover, the total cost of M equals the cost of the solution (x, y). Due to standard matching theory, we know that there also exists an integral matching M ′ in G whose cost does not exceed the cost of M , and since G is bipartite, we can compute such a matching in polynomial time, see e.g., [21]. We translate M back into an integral solution (x ′ , y ′ ) of the LP where we set y t,j = 1 for every edge (v j , w t,ℓ ) in M . It remains to show that (x ′ , y ′ ) satisfies (2), (3), (4a), (5), (6) and (7). All constraints but (4a) are immediately satisfied by construction. In order to show that (4a) is satisfied observe that where the third inequality follows from (6).

C Proof of Theorem 3 for general processing times
In this section, we provide the missing technical details which allow to generalize the proof of Theorem 3 from polynomially bounded processing times to general processing times.
There is a polynomial time algorithm for GSP with uniform release dates which computes a solution with optimal cost and which is feasible if the machine runs with speed 1 + ε.
We first prove that at 1 + ε speedup, we can assume that jobs "live" for at most O(log n) intervals, i.e., for each job j there are only O(log n) intervals between r(j) (the artificial release date) and C j . Then, we devise a dynamic program which moves on the time axis from left to right, considers blocks of O(log n) consecutive intervals at once and computes a schedule for them using the approach from Section 3.
Proof. By using 1 + ε speedup we create an idle time of in each interval I t . Then, the idle time during the interval I t+s with s := log 1+ε n ε 3 + 3 can fit all jobs j with r(j) ≤ R t : where the last inequality is a consequence of Lemma 2 which implies in the case of r(j) Since all jobs i with r(i) ≤ R t−1 can be assumed to be scheduled in the idle time of some earlier interval if necessary, we can assume R t−1 < r(j) ≤ R t , and hence, In particular, it is sufficient to consider s + 2 = O ε (log n) intervals for processing a job.

⊓ ⊔
Throughout the remainder of this section we denote by K := log 1+ε (q(n)) ∈ O ε (log n) where q(n) is the polynomial from Lemma 8. Thus, K denotes the number of intervals between the time r(j) and the completion time C j of each job j.
If after the assumption of Lemma 8 there is a point in time s that will not schedule any job, i.e., there is no job j with s ∈ [r(j), r(j) · q(n)), then we divide the instance into two independent pieces. Proposition 3. Without loss of generality we can assume that the union of all intervals j [r(j), r(j) · q(n)) is a (connected) interval.
For our dynamic program we subdivide the time axis into blocks. Each block B i consists of the intervals I i·K , ..., I (i+1)·K−1 . The idea is that in each iteration the DP schedules the jobs released during a block B i in the intervals of block B i and block B i+1 . So in the end, the intervals of each block B i+1 contain jobs released during B i and B i+1 .
To separate the jobs from both blocks we prove the following lemma.
Lemma 9. At 1 + ε speedup we can assume that during each interval I t in a block B i+1 there are two subintervals [a t , b t ), [b t , c t ) ⊆ I t such that during [a t , b t ) only small jobs from block B i are scheduled and during I t \ [a t , b t ) no small jobs from block B i are scheduled, -during [b t , c t ) only small jobs from block B i+1 are scheduled and during I t \ [b t , c t ) no small jobs from block B i+1 are scheduled, -a t , b t , c t are of the form (1+z· ε 4 4 (1+ε) 2 )·R t for x ∈ N and z ∈ {0, 1, ..., 4 (1+ε) 2 Proof. Based on Lemma 3 we can assume that all small jobs that are started within I t also finish in I t ; moreover, they are processed in some interval I t,k,ℓ ⊆ I t which contains no large jobs (see Lemma 3 for the notation). By Lemma 8, the interval I t can be assumed to contain only small jobs with release date in B i and B i+1 , and by Lemma 2 we know that we can rearrange the jobs in I t without changing the cost. Hence, for proving the lemma it is sufficient to show that we can split I t,k,ℓ at some of the discrete points given in lemma, such that the small jobs released in B i and B i+1 are scheduled before and after this point, respectively.
The interval I t,k,ℓ starts at (1 + 1 4 k · ε 4 /(1 + ε)) · R t and its length is some integral multiple of 1 4 ε 4 /(1 + ε) · R t . At a speedup of 1 + ε, the interval I t,k,ℓ provides additional idle time of length at least 1 4 ε 4 /(1 + ε) 2 · R t (if I t,k,ℓ is not empty), which equals the step width of the discrete interval end points required in the lemma. Hence, by scheduling all small jobs released in B i and B i+1 at the very beginning and very end of I t,k,ℓ , there must be point in time s := } which lies in the idle interval between the two groups of small jobs. Finally, if setting a t and c t to the start and end of I t,k,ℓ , respectively, and if choosing b t := s, we obtain intervals as claimed in the lemma.
⊓ ⊔ Using Lemma 8 we devise a dynamic program. We work again with patterns for the intervals. Here a pattern for an interval I t in a block B i denotes O(ε) integers which define the start and end times of the large jobs from B i−1 which are executed during I t , -the start and end times of the large jobs from B i which are executed during I t , -a t , b t , c t according to Lemma 9, implying slots for small jobs.
Denote byN the number of possible patterns for an interval I t according to this definition. Similarly as in Proposition 1 we have thatN ∈ O ε (1) andN is independent of t.
Each dynamic programming cell is characterized by a tuple (B i , P i ) where B i is a block during which at least one job is released or during the block thereafter, and P i denotes a pattern for all intervals of block B i . For a pattern P i , we denote by Q i (P i ) and Q i−1 (P i ) the set of slots in B i which are reserved for large jobs released in B i−1 and B i , respectively. Moreover, for some interval I t in B i let D i−1,t (P i ) and D i,t (P i ) be the two slots for small jobs from B i−1 and B i , respectively. The number of DP-cells is polynomially bounded as there are only n blocks during which at least one job is released and, as in Section 3, the number of patterns for a block is bounded byN Oε(log n) ∈ n Oε(1) .
The subproblem encoded in a cell (B i , P i ) is to schedule all jobs j with r(j) ≥ I i·K during [R i·K , ∞) while obeying the pattern P i for the intervals I i·K , ..., I (i+1)·K−1 . To solve this subproblem we first enumerate all possible patterns P i+1 for all intervals of block B i+1 . Suppose that we guessed the pattern P i+1 corresponding to the optimal solution of the subproblem given by the cell (B i , P i ). Like in Section 3 we solve the problem of scheduling the jobs of block B i according to the patterns P i and P i+1 by solving and rounding a linear program of the same type as sLP. Denote by opt(B i , P i , P i+1 ) the optimal solution to this subproblem.
Lemma 10. Given a DP-cell (B i , P i ) and a pattern P i+1 . There is a polynomial time algorithm which computes a solution to the problem of scheduling all jobs released during B i according to the patterns P i , P i+1 which does not cost more than opt(B i , P i , P i+1 ) and is feasible if during B i and B i+1 the speed of the machine is increased by a factor 1 + ε.
where J i ⊆ J denotes the set of all jobs j with r(j) ∈ B i , and i(t) is the index of the block the interval I t is contained in. This LP has exactly the same structure as sLP (1) -(7) and hence, we obtain an analogous result to Lemma 4. This means that given a fractional solution (x, y) to the above LP, we can construct an integral solution (x ′ , y ′ ) which is not more costly than (x, y), and which fulfills all constraints (9) - (14) with (11) being replaced by the relaxed constraint j∈Ji p j · y t,j ≤ |D i,t (P i(t) )| + ε · |I t | ∀ t ∈ {i · K, . . . , (i + 2) · K − 1} .
However, at speedup of 1 + ε 1−ε ∈ 1 + O(ε), an interval I t provides an additional idle time of ε · |I t | which allows for scheduling the potential job volume of by which we may exceed the capacity of the interval. Due to Lemma 2, this does not increase the cost of the schedule which concludes the proof.
⊓ ⊔ By definition of the patterns, an optimal solution OPT(B i+1 , P i+1 ) is independent of the patterns that have been chosen for earlier blocks. This is simply due to the separately reserved slots for jobs from different blocks within each pattern, i.e., a slot in B i+1 which is reserved for jobs from B i cannot be used by jobs from B i+1 in any case. Hence, OPT(B i , P i ) decomposes into OPT(B i+1 , P i+1 ) and opt(B i , P i , P i+1 ) for a pattern P i+1 ∈ P i+1 which leads to the lowest cost, where P i+1 denotes the set of all possible patterns for block B i+1 . Thus, formally it holds OPT(B i , P i ) = min Pi+1∈Pi+1 OPT(B i+1 , P i+1 ) + opt(B i , P i , P i+1 ) .